High-density SNP-based QTL mapping and candidate gene screening for yield-related blade length and width in Saccharina japonica (Laminariales, Phaeophyta)

Saccharina japonica is one of the most important marine crops in China, Japan and Korea. Candidate genes associated with blade length and blade width have not yet been reported. Here, based on SLAF-seq, the 7627 resulting SNP loci were selected for genetic linkage mapping to 31 linkage groups with an average spacing of 0.69 cM, and QTL analyses were performed to map the blade length and blade width phenotypes of S. japonica. In total, 12 QTLs contributing to blade length and 10 to width were detected. Some QTL intervals were detected for both blade length and width. Additive alleles for increasing blade length and width in S. japonica came from both parents. After the QTL interval regions were comparatively mapped to the current reference genome of S. japonica (MEHQ00000000), 14 Tic20 (translocon on the inner envelope membrane of chloroplast) genes and three peptidase genes were identified. RT-qPCR analysis showed that the transcription levels of four Tic20 genes were different not only in the two parent sporophytes but also at different cultivation times within one parent. The SNP markers closely associated with blade length and width could be used to improve the selection efficiency of S. japonica breeding.


Materials and Methods
Construction of BC 1 F 2 mapping population. The kelp germplasm used in this study was maintained at the Key Laboratory of Experimental Marine Biology of Institute of Oceanology, Chinese Academy of Sciences, Qingdao 29 . The "860" variety of S. japonica and the species S. longissima were selected as parents for construction of the mapping population. Variety "860" has ever been used extensively for cultivation and has a blade length of 240.2 ± 23.1 cm and a blade width 25.4 ± 3.0 cm 10 . S. longissima has a longer blade of usually 7~8 meters; thus, the species has often been used to breed new hybrids 11,30 . In brief, the male gametophyte clone of "860" was crossed with the female gametophyte clone of S. longissima to produce the F 1 hybrid generation. Sporelings were cultured in the laboratory until the blade length reached 1~2 cm, after which the seedlings were moved to sea cultivation. When the F 1 hybrids reached maturity in August, female and male gametophyte clones were isolated from one F 1 hybrid sporophyte and preserved. Then, the "860" male gametophyte was crossed with F 1 hybrid female gametophyte clones to develop the BC 1 F 1 population. One mature sporophyte was chosen from the BC 1 F 1 population and then self-fertilized in the laboratory to produce the BC 1 F 2 mapping population, which was cultivated in the sea (E 122°62′, N 37°22′). During the cultivation, the blade length and width of 178 individuals were measured on April 17th, May 8th, May 22th and June 9th. The blade length was measured from the top of the stipe to the blade tip. The width was measured at the widest part of the blade. DNA extraction. The gametophytes from the male parent, "860", and the female parent, S. longissima, as well as the 178 sporophytes from the BC 1 F 2 mapping population were used for genomic DNA isolation. Each sample was ground into a powder using a high-speed homogenizer, and genomic DNA was extracted and purified according to the improved CTAB (cetyltrimethyl ammonium bromide) method 31 . The purity and concentration of the extracted DNA were determined using both an ND-1000 spectrophotometer (NanoDrop, Wilmington, DE, USA), and electrophoresis in an 0.8% agarose gel with a lambda DNA standard marker. SLAF library preparation and sequencing. The parental gametophytes and 178 sporophyte progeny were genotyped using the reported SLAF method with minor modifications 25 . The draft genome sequences of the kelp were in silico digested (GenBank Accession Number: MEHQ00000000). HaeIII and RsaI were selected as the fittest restriction endonucleases for SLAF analysis. For SLAF library construction, the extracted genomic DNA was first incubated at 37 °C with HaeIII (New England Biolabs, NEB), T4 DNA ligase (NEB), and HaeIII adapter. Then, RsaI was added for additional digestion at 37 °C. The reaction-ligation reactions were inactivated by incubating at 65 °C. Then, PCR was performed using diluted restriction-ligation samples, dNTPs, Taq DNA polymerase (NEB) and PCR primers containing barcode 1. The PCR product were purified with Agencourt AMPure XP beads (Beckman Coulter, High Wycombe, UK) and pooled. The pooled sample was purified with a Quick Spin column (Qiagen) and then separated by electrophoresis on a 2% agarose gel. Fragments with sizes of 314~414 bp were excised and purified using a gel extraction kit (Qiagen, Hilden, Germany). Then, the purified products were subjected to PCR amplification with Phusion Masters Mix (NEB) and Solexa Amplification primers mix to add barcode 2. Finally, paired-end sequencing of 2 × 100 bp was performed on the selected SLAFs using an Illumina high-throughput sequencing platform (Illumina, Inc., USA). After sequencing, the Q30 and nucleotide compositions of the raw data were checked for sequencing quality.
Development of SNP markers from the SLAF reads and genotyping. The separate fastq files of SLAF reads for each kelp DNA sample were aligned to the draft reference genome of S. japonica (MEHQ00000000) via the program BWA 32 , with a maximum of 4% sequence mismatch allowed. The alignments were transformed to the SAM (sequence alignment map) format, combined and assigned into one master BAM (binary file of SAM) SCieNTifiC RePoRTS | (2018) 8:13591 | DOI:10.1038/s41598-018-32015-y format using SAMtools 0.1.18 33 . SNPs were detected with the genome analysis toolkit (GATK) using the BAM file; detailed manual for GATK analysis can be found at the website: https://software.broadinstitute.org/gatk/guide/.
The alleles belonging to each SNP marker were evaluated to determine minor allele frequency. For mapping populations of diploid sporophytes of kelp, one SNP locus can contain no more than four genotypes; thus, loci with more than four putative alleles were considered likely to originate from repetitive genome regions and excluded from the subsequent analysis. Therefore, only those SNP loci with 2~4 SLAFs were considered potential markers for construction of linkage maps in kelp. These polymorphic SNP markers were coded into eight segregation patterns as aa × bb, ab × cd, ef × eg, hk × hk, lm × ll, nn × np, ab × cc and cc × ab. Here, the segregation pattern aa × bb was used for genetic linkage map construction.
Construction of linkage map and detection of QTLs for blade length and width. Before construction of the linkage map, the SNP markers were filtered for quality, requiring genotype calls at each SNP to have a depth of coverage of at least four reads in each individual kelp sporophyte and simultaneously excluding markers of significant segregation distortion (P < 0.05). Moreover, the segregation distortion for the SNP markers used for construction of linkage map was also examined with the software DistortedMap 34 .
HighMap was used for genetic linkage map construction. Pairwise modified logarithm of odds (MLOD) scores were used to partition mapping markers into LGs (linkage groups), and those markers with MLOD values < 5 were excluded from marker ordering. For the iterative process of marker ordering, a combination of enhanced Gibbs sampling, spatial sampling and simulated annealing algorithms was applied. The SMOOTH strategy was adopted for error correction according to the parental genotypes. The Kosambi mapping function was used to calculate the map distances between markers. Moreover, a heat map and a haplotype map were constructed to assess the quality of the genetic map 35,36 .
QTLs underlying blade length and blade width were mapped using QTL IciMapping 4.1.0.0 (http://www. isbreeding.net) with the inclusive composite interval mapping module 37 . QTLs with additive and dominance effects were examined with the method ICIM-ADD. The parameters for ICIM-ADD were set as a 1.0 cM scan step, a 0.001 probability used for stepwise regression analysis, 2.5 used for the LOD threshold score for significant QTLs for each trait. Furthermore, QTLs underlying blade length and blade width were also detected with the software QTL.gCIMapping from the R website 38 .
Comparative mapping of candidate genes in detected QTLs. SLAF reads corresponding to the SNP markers in detected QTLs were used in a BLAST analysis against the kelp draft genome sequence (MEHQ00000000). The candidate genes were annotated and analyzed via the NCBI database (http://www.ncbi. nlm.nih.gov).

RNA extraction and RT-PCR analysis.
Blades of the parental sporophytes were collected on the four observation days for RNA extraction. Total RNA was extracted following the method of Yao et al. 39 . The extracted RNA was treated with RNase-free DNase I (TaKaRa, Dalian, China) to remove residual genomic DNA. First-strand cDNA was prepared with the PrimeScript II cDNA synthesis kit (Takara, Tokyo, Japan) according to the manufacturer's instructions and then stored at −80 °C.
RT-qPCR was performed with the SYBR Premix Ex Taq II kit (Takara, Tokyo, Japan) on the TP800 Thermal Cycler Dice (Takara, Tokyo, Japan). Thermal cycling protocol and data analysis were as previously reported 40 .

SLAF analysis and SNP marker development and genotyping.
In total, 364.11 M paired-end reads with a length of 200 bp were obtained from the 51.1 Gb of raw data from the SLAF library. The Q30 ratio was 87.04%, and the guanine/-cytosine content was 46.48%. In total, 13,567,809 reads were generated from the male parent (1.9 Gb of data) and 7,318,321 reads from the female parent (1 Gb). The average read number of the 178 individual sporophytes was 1,831,165. After the reads were mapped to the draft genome of kelp using BWA, we found that approximately 67.73% of total reads from the male parent, 44.18% from the female parent and 79.48% on average from each individual in the mapping population could be mapped to specific regions of the presumed chromosome regions. Ultimately, 1,365,570 polymorphic SLAFs were developed for the parents and the mapping population, of which 443,249 SLAFs were from the male parent, 226,614 from the female parent, and an average of 226,366 from the 178 BC 1 F 2 individuals ranging from 117,848 to 587,670. Therefore, the average coverage for each SLAF marker was 30.6-fold in the male parent, 32.3-fold in the female parent, and an average of 8.0-fold in each individual of the mapping population.
Using SAMtools and GATK for realignment and detection, we identified a total of 1, 092, 944 SNPs from the 364.11 M original reads in the parent gametophytes and 178 BC 1 F 2 sporophytes, of which 507, 994 SNPs appeared in the male parent, 330, 843 SNPs in the female parent, and an average of 292,415 in each of the 178 individuals of the mapping population.
After the low-depth SNPs were filtered out and the parental lines assigned different letters of alphabet to indicate their genotypes, we assessed SNP segregation patterns and found that 97, 189 SNPs out of 1, 092, 944 total polymorphic SNPs were successfully encoded and grouped into four segregation patterns (aa × bb, nn × np, lm × ll, hk × hk). Since the two parents were haploid gametophytes with genotypes of a and b, we selected 75,024 markers that followed the aa × bb segregation pattern for the subsequent linkage analysis. The quality of the genetic map was evaluated with heat maps and haplotype maps generated for each LG. The heat maps showed that there were no ordering errors detected in the 31 LGs of kelp. The heat map for LG6 is provided as additional Fig. S1. The haplotype maps confirmed that most of the recombination blocks in each LG were clearly defined. The haplotype map for LG6 is provided as additional Fig. S2.

QTL analyses for blade length and blade width.
Based on the high-density SNP linkage map and the phenotype data ( Fig. 2; Table 2), we conducted QTL analyses of the blade length and width in kelp on the four days observed: April 17th, May 8th, May 22th and June 9th (Table 3). QTLs detected with LOD values above 2.5 using 1000 permutations were recognized as significant intervals. In total, 12 QTLs were detected for blade length and 10 for blade width (Table 3). Nevertheless, the total number of QTLs was only 18, indicating that some QTLs are likely pleiotropic. For example, the intervals between marker57879 and marker57878, and from LG7 was simultaneously detected for blade length and blade width on April 17. The QTL interval between marker52375 and marker52366 (LG15) was detected for blade width on May 22th and June 9th. In addition, the QTL interval between marker26129 and marker26127 (LG30) was detected for at least one trait on three observation dates. To assess the additive effects of each QTL in the kelp, the parental origins of the alleles at each QTL were evaluated. For qL4-5, the additive allele for increasing blade length originated in the female parent, S. longissima, while the additive allele of qW5-7 for increasing blade width originated in the male parent, "860". Similar relationships were also found for qL10-4 and qW7-15. Interestingly, the allele from the female parent, S. longissima, could increase blade width at qW12-30 detected for width on June 9th, despite the fact that S. longissima has a narrow blade. Additionally, the allele from the male parent, which has a shorter blade, could increase blade length at qL6-24, which was mapped for blade length on April 17th. In addition, some QTLs were detected with obvious dominant values, which indicated that significant dominance existed between the alleles at these QTLs. The phenotypic variance explained by each designated QTL ranged from 5.37% to 18.69%. The most influential QTL interval was between marker26422 and marker53425 in LG24, accounting for 18.69% of the observed phenotypic variance with a LOD value of 5.53 (Table 3). The least effective QTL interval was between marker8613 and marker37947 in LG20 on May 22th, accounting for only 5.37% of the observed phenotypic variance with a LOD value of 2.52. The confidence intervals spanned by each mapped QTL ranged from 0.28 cM to 4.68 cM, with an average of 1.21 cM. The smallest interval was between marker26129 and marker26127 in LG30, and the largest interval was between marker5803 and marker5890 in LG2. The QTLs underlying the blade length and the blade width were also examined with the software QTL.gCIMapping from the R website, which adopted different genetic model and different statistical method for the QTL analysis from that in software QTL IciMapping, and the results were listed in Table S2. Although many QTLs detected by QTL.gCIMapping were different from that in Table 3, some QTLs still remained the same, such as the QTLs 1, 2, 3 for the blade length at April 17 th in Table S2.
Candidate genes associated with blade length and blade width. Through comparison mapping, candidate genes were identified in the QTL intervals between marker26422 and marker53425 in LG24 and between marker26129 and marker26127 in LG30. Fourteen Tic20 genes were annotated between marker26422 and marker53425 in approximately 190 kb (208307~398831 in scaffold 192), and three peptidase genes were  Table S3). The collinear relationship between LG30 and the reference genome including scaffold 182 was evaluated (Fig. S3). The proteins encoded by the Tic20 genes composed the TOC and TIC (translocon at the outer/inner envelope membrane of chloroplasts, respectively) complexes, which are responsible for translocating precursor proteins synthesized in the cytosol across or into the double chloroplast membrane envelope 41 . The enzymes encoded by the three genes of peptidases s8 and s53 belong to clan SB serine peptidases 42 . Transcriptional analysis of four selected Tic20 genes in the parental sporophytes was performed with RT-qPCR on the four growth dates; the results showed that the expression of these genes was different not only between the parent sporophytes but also between the four growth dates (Fig. 3). At the third observation day, for male parent "860" of S. japonica, each of the four Tic20 genes had the highest expression level and the expression level of Tic20-1 was significantly higher than that of female parent S. longissima (16.3 fold, P < 0.01). For Tic20-1, the expression level at the third observation day was significantly higher than that of the second observation day in "860" of S. japonica (7.9 fold, P < 0.01).

Discussions
The collinearity of a high-density genetic linkage map with a reference genome can be used to enable the fine mapping of quantitative traits. In soybean, one map consisting of 5785 SNPs was constructed to identify the genes encoding isoflavone biosynthetic enzymes, with the detected QTL then comparatively mapped to the soybean reference genome 43 . Similarly, a high-density genetic map with 3441 SNP markers was generated for apple, and 80, 64, 17 genes related to fruit weight, fruit firmness and fruit acidity, respectively, were detected, after the relevant QTLs were comparatively mapped to the complete apple genome 44 . A comparative mapping strategy based on next-generation sequencing has also been attempted for localization of one "mut" locus for impaired branching pattern in Ectocarpus siliculosus 45 . Here, in S. japonica, we constructed a high-density genetic linkage map based on 7627 SNPs with an average genetic distance of 0.69 cM. We used this map to perform fine mapping for QTLs related to blade length and blade width in S. japonica. Eventually, 14 Tic20 genes and three peptidase s8 and s53  genes were identified as being associated with blade length and width in S. japonica. Thus, the colinearity of high-density genetic linkage maps with reference genomes can be a feasible and efficient way to identify candidate genes for quantitative traits. A genetic correlation was detected between blade length and blade width in S. japonica in this study. The correlation coefficient between these two traits ranged from 0.80 to 0.92 (Table S4). Similarly, Wang (1984) reported that the phenotype correlation coefficient between blade length and blade width in S. japonica ranged from 0.81 to 0.95 16 . These results provide further evidence that phenotypically correlated traits are often mapped together, such as frond length and width in Pyropia haitanensis 26 and kernel number per spike and thousand-kernel weight in wheat 46 . Liu et al. 20 believed that the QTLs determining blade length and width in S. japonica were linked and located in the same chromosome region, but these authors did not detect a close correlation between blade length and blade width QTLs in their study 20 . In contrast, our study found that many QTLs determining both blade length and width were mapped to the same areas, such as the interval between marker57879 and marker57878 ( Table 3). This indicated that the close phenotypic correlation of blade length and width in S. japonica was due mostly to genetic correlation, although environmental influences may also play a role. Due to the positive genetic correlation of blade length and blade width in S. japonica, selection for blade length could cause concurrent changes in blade width. Therefore, the identified locations of QTLs controlling both blade length and blade width  in S. japonica should provide markers for more efficient marker-assisted selection (MAS) for these two traits in future breeding. The genetically derived phenotypic correlation between blade length and width may be attributed to pleiotropy or to linkage disequilibrium 47 . Even though some QTLs for blade length and blade width were co-localized in the same genomic regions in this study, the possibility that the genes in the mapped QTLs are pleiotropic for these two traits must be further verified. The effects observed thus far might instead be due to linkage disequilibrium between different genes determining these two traits.
Fourteen Tic20 genes and three peptidase s8 and s53 genes were annotated to the QTL intervals between marker26422 and marker53425 in LG24 and between marker26129 and marker26127 in LG30, respectively. However, this finding does not exclude that the existence of other genes or regulators associated with blade length and blade width in S. japonica because the data used in this study were incomplete; the kelp genome contains some gaps in the contigs, and the correct assembly of some scaffolds remains uncertain.
The genes identified from the peptidase s8 and s53 gene families may be candidate genes for selection and breeding in kelp. In rice, one GS5 QTL was found to encode a putative serine carboxypeptidase belonging to the peptidase S10 family that contained a promoter region reported to be associated with grain width 48 . The peptidase s8 and s53 gene families are believed to have undergone a large evolutionary expansion in brown algae, and population genomic studies have showed that these gene families were artificially selected during the domestication of S. japonica 49 . Further research should explore the direct relation of these genes to blade length and width.
The expression differences in four Tic20 genes in the parental genotypes on the four cultivation dates were examined (Fig. 3). Whether these Tic20 genes are involved in determining blade length and blade width in S. japonica, remains to be verified. In the future, NIL (near isogenic lines) or CSSL (chromosome segment substitute lines) could be constructed to allow map-based cloning of the genes in the QTL region, or RNAi could be employed to verify the functions of these genes in kelp growth 50,51 .
In conclusion, a high-density genetic linkage map with 7627 SNPs was constructed for QTL analyses of blade length and width in S. japonica. A lot of 12 QTLs were found to be associated with blade length and 10 with blade width. The candidate genes mapped to these QTL intervals included 14 Tic20 genes and three peptidase s8 and s53 genes, which were proposed as candidate genes for blade length and width in S. japonica. The identified locations of QTLs controlling both blade length and width in S. japonica should lead to markers allowing for more efficient MAS for these two traits in future breeding. Figure 3. The expression pattern of four Tic20 genes in parent "860" and S. longissima on four observation days 1(April 16th), 2(May 6th), 3(July 2th), 4(July 24th). (a) Tic20-1 (b) Tic20-5 (c) Tic20-10 (d) Tic20-12. * and ** indicate significance level of P < 0.05 and 0.01, respectively.