High density linkage map construction and QTL mapping for runner production in allo-octoploid strawberry Fragaria × ananassa based on ddRAD-seq derived SNPs

Recent advances in high-throughput genome sequencing technologies are now making the genetic dissection of the complex genome of cultivated strawberry easier. We sequenced Maehyang (short-day cultivar) × Albion (day-neutral cultivar) crossing populations using double digest restriction-associated DNA (ddRAD) sequencing technique that yielded 978,968 reads, 80.2% of which were aligned to strawberry genome allowing the identification of 13,181 high quality single nucleotide polymorphisms (SNPs). Total 3051 SNPs showed Mendelian segregation in F1, of which 1268 were successfully mapped to 46 linkage groups (LG) spanning a total of 2581.57 cM with an average interval genetic distance of 2.22 cM. The LGs were assigned to the 28 chromosomes of Fragaria × ananassa as determined by positioning the sequence tags on F. vesca genome. In addition, seven QTLs namely, qRU-5D, qRU-3D1, qRU-1D2, qRU-4D, qRU-4C, qRU-5C and qRU-2D2 were identified for runner production with LOD value ranging from 3.5–7.24 that explained 22–38% of phenotypic variation. The key candidate genes having putative roles in meristem differentiation for runnering and flowering within these QTL regions were identified. These will enhance our understanding of the vegetative vs sexual reproductive behavior in strawberry and will aid in setting breeding targets for developing perpetual flowering and profuse runnering cultivar.

and asexual reproduction period overlaps in summer 4,5,9 . Evidences from both wild and cultivated strawberry show that flowering and runnering are genetically distinct but mutually exclusive processes as the predominance of one over the other is intricately related with genetic and environmental factors [3][4][5]7 . Longer days and higher temperatures promote runnering in SF genotypes, which in short and cooler days produce branch crowns that are differentiated to inflorescence and hence, increases flowering and crop productivity 5,10,11 . Higher temperatures also increase runnering in PF genotypes, however, the effect of photoperiod were found to be variable by different experiments 3,[12][13][14] . PF genotypes are, in general, preferred for an extended commercial harvest season of fresh berries. However, the poor runnering habit of the PF genotypes compared to SF genotypes hinders their propagation, making it one of the prioritized breeding targets which require wider genetic understanding of the trait 3,[15][16][17] .
The studies on the diploid strawberry identified that the PF and RU trait are controlled by different alleles 18 , while in cultivated strawberry, common genetic control with opposing effect has been reported 3 . Fewer studies reported single dominant gene [19][20][21] , while many studies favored the polygenic control of the PF trait in cultivated strawberry 17,22 . Developing the genetic maps and fine mapping of the loci in the genome will be helpful to genetically disentangle the trait. However, the complex allo-octoploid (2n = 8x = 56) genome of cultivated strawberry, which consists of four relatively similar sub-genomic chromosome sets from diploid donors posed as a constraint in this regard [23][24][25][26] . Several studies have constructed linkage maps using different suite of molecular markers such as RAPD, AFLP and SSR, etc. [27][28][29][30][31][32] and few of those studies reported QTLs with various additive effects for PF and RU traits in cultivated strawberry 3,4,15,17,[33][34][35] . However, these markers largely suffer from either non-transferability across investigations, insufficient genome-wide coverage, poor density or higher developmental and screening costs 3,15,36 .
Recent release of strawberry virtual reference genome and advances in reduced sequencing technologies such as ddRAD-seq (restriction site associated DNA sequencing), DArT-Seq (Diversity Arrays Technology) and GBS (genotyping by sequencing) offer the potentiality of rapid and cost effective detection of genome wide single nucleotide polymorphisms (SNPs) that can be used in construction of high density linkage maps and fine mapping of the traits of interest 23,26,37 . Such studies have recently been reported in few polyploid species such as wheat, Brassica napus, sugarcane, sweet potato and peanut etc. 38,39 . Very recently, the efficacy of DArT 24 , ddRAD-seq 40 and GBS 41 technologies have been reported for developing SNP based linkage maps in cultivated strawberry which showed the way for development of linkage maps with increased marker density and increased genomic span. None of these studies, however, attempted to identify QTLs using these SNP based linkage maps. We thus primarily aimed at developing high density and greater genome covering linkage map using ddRAD-seq derived SNPs and in addition, mapping QTLs for runner production in a population raised by crossing parents with contrasting runner producing capability which will be helpful in understanding the genetics of runnering in strawberry.

Results
ddRAD-seq based sNp discovery and genotyping. The ddRAD (PstI and MspI) representation libraries, prepared from the genomic DNA of Maehyang × Albion (M × A) populations were successfully sequenced using the Illumina HiSeq platform. The paired-end sequencing of individuals generated a total of 61,065,374 reads (10 GB sequence data). On average, 978,968 high quality reads per sample (80.2% of total reads) were aligned against the Fragaria × ananassa genome version FAN_r1.1 (Fig. 1). The statistics of raw reads, cleaned reads, reads mapped to the reference genome and alignment ratio for individual accessions were summarized in Supplementary Table S1. The mapped reads were further investigated to select a total of 13,181 high quality SNPs based on strict SNP filtration criteria (SNP quality score ≥ 999, minimum depth = 5, minimum allele frequency = 0.05 and minimum proportion of missing data = 0.5) through VCFTools program. The SNPs were distributed as follows, 678 (5%) C/A, 1914 (14%) G/A, 508 (4%) G/C, 889 (7%) T/A, 1765 (13%) T/C and 683 (5%) T/G ( Fig. 2A). Cumulatively, the SNP transitions (A/G or C/T) were higher (3728) compared to transversions (G/T, A/C, A/T or C/G) (3016) and the transition/tranversion ratio was 1.23 (Fig. 2B). Overall, the highest SNP frequencies were observed for transition C/T (1941) followed by A/G (1787). In addition, transition/tranversion (TS/TV) ratio of >0.5 was used for calculating divergence and restructuring the phylogenetic tree 42 . The high quality SNPs were annotated using SnpEff tool (version 4.4) and the generated database was used to annotate possible SNPs based on their genomic, exonic and functional classes. A total of 6,094 SNPs (34.7%) were exonic while 1,844 (10.5%) and 3,245 (18.7%) were intronic and intergenic SNPs, respectively ( Supplementary Fig. S1). The lowest number of SNPs (1.9%) were located in the 5′UTR region followed by 4.3% (759) located in 3′UTR region.   Fig. 3). The remaining markers that were not included were either not linked to any of the recognized LGs, were mapped in small groups or showed conflicting segregation pattern with other markers of the same linkage group at the selected LOD threshold.  Table 1 and Supplementary Table S4 and the corresponding gene IDs, the SNP positions and genotypes are shown in Supplementary Table S5. High degree of collinearity between the genetic distances of the mapped SNP markers of each linkage group and their corresponding physical position on the F. vesca chromosomes were observed for most of the LGs except for few shorter and less dense LGs such as LG1B2, LG2D2, LG4B1, LG5A1 and LG5A2 ( Supplementary Fig. S2). phenotypic evaluation. The two cultivars exhibited significant variation (p < 0.001) in terms of number of runner production in one growing season ( Fig. 4A-C). The seasonal flowering cultivar 'Maehyang' produced 15 runners while the perpetual flowering cultivar produced only one runner during the entire growing season (Fig. 4C). In the segregating F 1 population derived from crossing these two contrasting cultivars, continuous variation was observed for total number of runners (Fig. 4D). The minimum (0) and the maximum (15) number of runners were produced by one and two F 1 lines, respectively while a moderate 8-10 runners were produced by 19 F 1 lines (Fig. 4D).
Candidate genes within the QTL regions. Altogether 587 genes were found within the seven QTL regions, with a maximum of 196 genes within QTL qRU-5C and minimum of 19 genes within QTL qRU-5D and qRU-1D2 each (Supplementary Table S6). Among these, the key candidate genes having putative roles in the vegetative vs reproductive differentiation of shoot apical meristems and regulation of flowering are listed in supplementary Table S7. For example, WUSCHEL related homeobox 1 (FvH4_5g17270) and AGAMOUS-like 71 (FvH4_4g20680) within QTL qRU-4D, CLAVATA3/ESR (CLE)-related protein coding gene FvH4_2g19210  n/a n/a n/a n/a 184.33 Mb d Table 1. www.nature.com/scientificreports www.nature.com/scientificreports/ within qRU-4C and Subtilisin-like protease (FvH4_1g17150, FvH4_1g17160 and FvH4_1g17170) within qRU-1D2 may have roles in the CLAVATA-WUSCHEL signaling pathway that specifies meristematic stem cell fate during differential of shoot and floral buds. Several genes encoding cytokinin dehydrogenase 7 (FvH4_2g30990, FvH4_2g31000 and FvH4_2g31010), auxin response factor 17-like (FvH4_2g19860), auxin efflux carrier family protein (FvH4_5g173100), NAC domain-containing protein 86 (FvH4_5g18130) and homeobox protein knotted-1-like gene KNOXI (FvH4_5g17670) could be involved in cytokinin-auxin mediated meristem differentiation. In addition, a total of 12 tetratricopeptide repeat (TPR)-like superfamily protein genes such as FvH4_5g32370; FvH4_5g32380 and FvH4_2g30790 etc. are found within all but qRU-3D1 QTLs which may have roles in the maintenance of meristem cell organization. Two phytochrome related genes, FvH4_4g19750 and  www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
The intricate interplay between vegetative and sexual reproduction with the regards to the environmental cues have both biological and economic significance. This makes strawberry an important model system to study flowering behavior. Runner and flower arise from different meristem crowns (basal and terminal, respectively) based on competitive investment of resources triggered mainly by environmental factors 4,7,13,43 . Genetics of these two mutually exclusive and antagonistic processes of wild and cultivated strawberry has long been investigated and several hypotheses, mostly favoring polygenic over monogenic control has been put forwarded by previous studies 2, 5,15,18,19,44 . Inheritance studies and mapping of the traits of interest at molecular level necessitated the construction of linkage maps. Here, we report the construction of a linkage map and identification of QTLs for number of runners based on the SNPs detected in the F 1 progeny developed from the cross of profuse and poor runnering cultivars, Maehyang and Albion, respectively by aligning the ddRAD-seq reads against strawberry genome (FAN_r1.1).
History of construction of LGs in strawberry followed distinct phases, starting from the initial low density and non-transferrable marker based LGs 17,27 followed by the use of a comprehensive suite of sequence characterized and transferrable markers 21,28,32 which were further updated upon the release of the diploid F. vesca genome and octoploid Fragaria × ananassa draft genome 29,36,45 and finally the development of high-throughput SNP based LGs 24,40,41,46 . The first genetic linkage map in octoploid strawberry were constructed on the full-sib progeny of 'Capitola' × 'CF1116' , where 235 and 280 single dose restriction fragment (SDRF) markers were mapped on the 43 co-segregating groups spanning a total of 1604 and 1494 cM in female and male parents, respectively 27 . Rousseau-Gueutin et al. 32 included an additional 306 SSR and 5 STS and SCAR markers in the original AFLP marker set of Lerceteau-Köhler et al. 27 Table S4). The highest 20 markers (out of 73) of LG5D were mapped to a different chromosome followed by LG4C (18 out of 51). This is in agreement with the report of highly conserved macro-synteny between diploid and octoploid strawberry with few inter-chromosome rearrangements 25,29,32 . These high degree of collinearity further indicates the reliability of the constructed linkage map. However, it is noteworthy that 41% (1268) of the total markers (3051) which showed significant Mendelian segregation ratio were assigned to LGs and there are five LGs with less than 15 markers and two LGs with a longest gap of ~20 cM. This could be attributed to the stringent analysis parameters such as a high LOD (>5) aiming at constructing a representative linkage map by avoiding spurious linkage that can facilitate accurate detection of QTL.
Using the SNP marker based linkage group, we are the first to report QTL in Fragaria × ananassa. Besides the composite interval mapping, we used multi-locus GWAS methods such as mrMLM, FASTmrMLM, pLARmEB and ISIS EM-BLASSO which is shown to detect minor effect QTLs [47][48][49][50] . Altogether, seven QTLs, two with relatively high LOD scores (>7.0) were identified which further strengthens the polygenic control of runnering hypothesis 8,17 . Gaston et al. 3 demonstrated common genetic control of flowering and runnering by a major dominant locus (FaPFRU) which positively influences flowering and exerts a negative effect on runnering. Of the seven QTLs identified in this study, five were negative effect QTLs. Using SSR marker derived LGs, a single major QTL for runnering and perpetual flowering reported by Castro et al. 15 was further confirmed by Sooriyapathirana et al. 33 using an extended population. Using three separate Japanese population, Honjo et al. 35 mapped the PF trait to similar region of previously detected QTL regions.
The breeding industry requires both perpetual flowering and profuse runnering traits but the PF genotypes are poor runner producers. Several putative candidates for PF have already been reported in wild and cultivated strawberry such as TFL1, KSN, FT1, FT2, FT3, FvSOC1 and TCP7 34,[51][52][53][54][55] . However, their exact roles in the intricate interplay between meristem differentiation leading to inflorescence or runners are yet to be fully established. In this study, we have mined the genes within the QTL regions and identified several key candidate genes having putative roles in CLAVATA-WUSCHEL signaling, cytokinin-auxin mediated meristem differentiation, directional cell division, meristem stability and organogenesis, gravitropism, vernalization and photoperiodic regulation of flowering etc. that ultimately determines the stem cell fate towards shoot or floral meristem differentiation [56][57][58][59] (Table S7). Functional characterization of these genes will enhance our understanding of the complex interplay between vegetative vs sexual propagation in strawberry. In addition, the sequence polymorphisms such as the peak QTL SNP markers identified in this study could be used for marker assisted breeding ( Table 2). Very recently, the gene FveRGA1 (gene06210 in F. vesca_v1.0) encoding a putative DELLA protein GAIP-B has been shown to play a role in stolon production in diploid strawberry 60   www.nature.com/scientificreports www.nature.com/scientificreports/ our study population. However, one SNP was found in their closest genetic locus FAN_iscf00207522.1 (Fvb4: 32318115..32316672) in the linkage group LG4D at 19.19 cM apart from our qTL qRU-4D.
In conclusion, we report a linkage map with higher density and greater genome-wide span using the ddRAD-seq derived SNPs from the progeny of two strawberry cultivars with contrasting flowering and runnering attributes. In addition, we have also identified QTLs for total number of runners. This high-throughput genotyping based linkage map will serve as a reference for precise sequence scaffold anchoring and orientations in this species and the approach can be replicated in other genetically complicated species. The candidate genes identified within the QTLs, upon functional validation, can be targeted for biotechnological manipulations to develop genotypes with desirable flowering and runnering habit. Besides, the large number of SNPs will supply abundant choices of transferrable markers for future genetic studies and will assist in identifying QTLs, causal genes and linked markers for agronomically important traits which in turn will accelerate strawberry genetic improvement programs via molecular breeding.  Table S1). The adaptor-ligated DNA amplicons were pooled and DNA fragments of 300-900 base pair length were separated with BluePippin (Sage Science, Beverly, MA, USA). The constructed ddRAD-seq libraries were sequenced on HiSeq platform (Illumina, USA) using 93 base paired-end (PE) mode as described by Shirasawa et al. 38 .

Materials and
sequence data analysis and sNp detection. Primary data processing of ddRAD-seq was performed as per the procedures described in Shirasawa et al. 38 with minor modifications. The sequencing data of strawberry accessions were examined for their quality using FastQC tool (http://www.bioinformatics.babraham. ac.uk/projects/fastqc/). The low-quality sequences and adaptor sequences were trimmed using PRINSEQ (http:// prinseq.sourceforge.net) 61 and fastx_clipper in FASTX-Toolkit (version 0.10.1; http://hannonlab.cshl.edu/fastx_ toolkit), respectively. The obtained high quality reads from each accession were then mapped to the genome of Fragaria × ananassa (FAN_r1.1, http://strawberry-garden.kazusa.or.jp/) as reference using Bowtie 2 tool (version 2.1.0; parameters: -minins 100 -no-mixed) 62 . The resulting sequence alignment/map format (SAM) files were converted into Binary Alignment Map (BAM) files and thereafter, SAMtools (version 0.1.19; parameters: -Duf) was used for sorting, indexing and removal of duplicates 63 . Thereafter, genomic variants (SNPs) were called out for each strawberry accessions against reference genome using mpileup module from SAM tools (parameters: -Duf) and BCF tools (parameters: -vcg). In addition, the produced variant call format (VCF) files including SNP details were further filtered as per procedures described by Shirasawa et al., (2017) using VCFtools (version 0.1.11) 64 . Furthermore, missing data were imputed using Beagle4 software package (version 5.0) 65 . The effect of SNP annotations on gene functions were predicted using SnpEff tool (version 4.11) 66 .
Construction of linkage map. The linkage analysis was performed using the SNP-based markers that were homozygous in one parent and heterozygous in the other parent (segregation type code lm × ll and nn × np in JoinMap version 4.1) and heterozygous in both parents (segregation type code hk × hk). Markers with more than 10% missing data and markers that did not fit significantly with the segregation ratio of 1:1 or 1:2:1 in the progeny as per the chi-square test of goodness of fit (p < 0.05) were excluded from map construction. The regression mapping algorithm and Kosambi's mapping function with a maximum recombination fraction of 0.45, goodness-of-fit jump threshold of 5 and a ripple value of 1 were used for calculating marker order and the map distances were expressed in centi-Morgans (cM) using JoinMap v4.1 (Kyazma, Wageningen, The Netherlands). Linkage groups with a minimum logarithm of odds (LOD) score limit of 5.0 were visualized in MapChart (version 2.32) 67 . All the linkage groups except the smaller ones (<10 markers) were mapped on the F. vesca (genome v4.0.a1) genome. The LGs were named LG1 to LG7 based on the number of F. vesca chromosome to which the marker associated sequences of each of the LGs were mapped. The suffixes ' A'-'D' were assigned to the LGs mapped to a particular chromosome based on their sequential physical position throughout the length of that chromosome. If any two small linkage groups were found to be mapped to a particular segment of a F. vesca chromosome sequentially, they were considered as one linkage group.
www.nature.com/scientificreports www.nature.com/scientificreports/ QTL identification. QTLs were mapped based on the marker and phenotypic information of the parents and each individual of the mapping population by the composite interval mapping (CIM) analysis with the following parameters (linkage map method = 10, segregation test size = 0.01, linkage test size = 0.35, mapping function = Kosambi, objective function = SAL and step size = 1 cM) using Windows QTL Cartographer (version 2.5_011) program 68 . In addition, QTLs were also identified using several 'multi-locus mixed linear model' such as mrMLM, FASTmrMLM, pLARmEB and ISIS EM-BLASSO methods in R program [47][48][49][50] . Putative QTLs were declared based on the significant logarithm of odds (LOD) threshold determined by 1000 permutations 69 . The square of the partial correlation coefficient (R 2 ) indicates the proportion of phenotypic variation explained by a QTL. The genes along with their putative functions that lie within the QTL regions were identified against Fragaria vesca annotated genome version v4.0.a1 using the Fragaria × ananassa genome IDs corresponding to the flanking markers of the QTLs as blast queries.

Data Availability
The ddRAD-seq raw reads generated in this study were deposited into Sequence Read Archive (SRA) under NCBI accession PRJNA478299 accessible from (https://www.ncbi.nlm.nih.gov/sra/PRJNA478299).