A high-density SNP genetic map consisting of a complete set of homologous groups in autohexaploid sweetpotato (Ipomoea batatas)

Sweetpotato (Ipomoea batatas) is an autohexaploid species with 90 chromosomes (2n = 6x = 90) and a basic chromosome number of 15, and is therefore regarded as one of the most challenging species for high-density genetic map construction. Here, we used single nucleotide polymorphisms (SNPs) identified by double-digest restriction site-associated DNA sequencing based on next-generation sequencing technology to construct a map for sweetpotato. We then aligned the sequence reads onto the reference genome sequence of I. trifida, a likely diploid ancestor of sweetpotato, to detect SNPs. In addition, to simplify analysis of the complex genetic mode of autohexaploidy, we used an S1 mapping population derived from self-pollination of a single parent. As a result, 28,087 double-simplex SNPs showing a Mendelian segregation ratio in the S1 progeny could be mapped onto 96 linkage groups (LGs), covering a total distance of 33,020.4 cM. Based on the positions of the SNPs on the I. trifida genome, the LGs were classified into 15 groups, each with roughly six LGs and six small extra groups. The molecular genetic techniques used in this study are applicable to high-density mapping of other polyploid plant species, including important crops.

due to genome multiplication, which can lead to heterosis, gene redundancy, loss of self-incompatibility, and gains in asexual reproduction 8 . Therefore, constructing genetic maps for polyploid species is important for identifying beneficial trait loci and performing genome-based breeding. Polyploid plants can be allopolyploids or autopolyploids. In allopolyploids, chromosome pairings generally occur between homologous chromosomes, but not between homeologs, with a few exceptions 9 . Therefore, the manner of inheritance is expected to be similar to that in diploids, i.e., Mendelian inheritance. By contrast, in autopolyploids, one chromosome pairs with either homologous chromosome counterpart, resulting in a complex inheritance pattern. In the progeny of autotetraploid crops including potato (Solanum tuberosum), rose (Rosa hybrida), alfalfa (Medicago sativa), and blueberry (Vaccinium corymbosum), theoretical models of linkage analysis have been thoroughly investigated [10][11][12][13] , and TetraploidMap software is available to construct linkage maps for autotetraploid species 14 . On the other hand, in autohexaploids, e.g., sweetpotato (Ipomoea batatas), the genetic model is complex due to the 13 possible allele combinations (Supplementary Table S1). Among these, simplex (e.g., AAAAAA × AAAAAa) and double-simplex markers (AAAAAa × AAAAAa) show simple disomic-like segregation patterns of 1:1 and 1:2:1, respectively, in the progeny (Supplementary Table S1). For linkage analysis, simplex markers are used to construct framework linkage maps, but double-simplex markers, together with duplex (AAAAAA × AAAAaa) and triplex markers (AAAAAA × AAAaaa), are employed to identify homologous groups (HGs) 15,16 .
Sweetpotato is an autohexaploid crop with 90 chromosomes (2n = 6x = 90), as mentioned above. Sweetpotato is used worldwide for vegetable and staple food production, as well as food processing starch production. However, the inheritance patterns of agronomically important traits in this crop are extraordinary complex due to its autohexaploidy and large chromosome number. Therefore, genetic maps are needed for efficient breeding. Several genetic maps for sweetpotato based on random amplified polymorphic DNA (RAPD), SSR, amplified fragment length polymorphism (AFLP), and transposon-insertion polymorphism have been reported to date [15][16][17][18][19][20] . In addition, homologous groups (HGs) have been identified via anchoring with double-simplex, duplex, and triplex markers 15,16 . However, these maps are considered to be incomplete, as they contain many short linkage groups (LGs) consisting of a few DNA markers, and small numbers of anchor markers were used to identify HGs.
Three strategies can be utilized to overcome this situation, which have several advantages over previous methods: (1) SNP markers can be used to increase the number of segregated genetic loci; (2) the genome sequence of I. trifida, a likely diploid ancestor of sweetpotato 21 , can be used as a reference to detect SNPs and to identify HGs; and (3) an S1 mapping population produced from self-pollination of a single parent can be used to simplify the complex genetic mode of autohexaploidy (Supplementary Table S1). In the current study, to explore these possibilities and establish a high-density genetic map for sweetpotato, we employed double-digest RAD-Seq (ddRAD-Seq) technology 22 , in which genomic DNA is digested with two different restriction enzymes instead of a single nuclease, as used in the original RAD-Seq protocol 6 . Using our recently established analytic pipeline for ddRAD-Seq 23 , we successfully produced the first high-density genetic map for sweetpotato.

Results
ddRAD-Seq of an S1 mapping population of cv. Xushu 18. We generated an S1 mapping population (n = 142) via self-pollination of the sweetpotato cultivar, Xushu 18. We used genomic DNA extracted from individuals of this population to construct ddRAD-Seq libraries using two combinations of restriction enzymes, PstI-MspI (PM) and PstI-EcoRI (PE). Paired-end sequencing data, as well as single-read sequences, were obtained from the libraries. Approximately 1.8 million (M), 1.2 M, and 1.9 M high-quality reads per line were obtained after trimming adapters and low-quality sequences from paired-end data from the PstI-MspI library (PM-PE) and the PstI-EcoRI library (PE-PE) and from single-read data obtained from the PstI-MspI library (PM-SR), respectively (Supplementary Table S2). The reads were mapped onto the reference genome sequence of I. trifida, ITR_r1.0, with mapping alignment rates of 73.4% (PM-PE), 78.6% (PE-PE), and 75.5% (PM-SR) (Supplementary Table S2). Of the entire reference sequence (512,990,885 bases), 2,675,332 (0.5%), 1,901,821 (0.4%), and 1,897,014 bases (0.4%) were covered at a depth of ≥ 10 reads in the PM-PE, PE-PE, and PM-SR libraries, respectively (Supplementary Table S2). As a result, 5,015,871 bases (1.0%) from the reference genome were covered by ≥ 10 reads from the three libraries. We used the combined data from the three libraries for subsequent analysis.
SNP calling and selection of double-simplex SNP loci. When sequence reads from the six genomes of hexaploid sweetpotato were mapped onto the haploid genome sequence of I. trifida, we assumed that the SNP loci would show six different genotypes: AAAAAa, AAAAaa, AAAaaa, AAaaaa, Aaaaaa, and aaaaaa, where "A" and "a" represent alleles identical to and different from the I. trifida alleles, respectively. The AAAAAA genotype would not be identified among SNP loci due to the lack of sequence differences between the two species. Hereafter, "A" and "a" are referred to as REF (reference) and ALT (alternative) alleles, respectively. In addition, the frequency of ALT alleles for each SNP locus is referred to as the ALT allele frequency (AAF), which was calculated with the following formula: ( Based on the sequence alignment data, 94,361 SNP candidate loci were identified after filtering using two criteria: (i) depth of coverage ≥ 10 for each S1 line and (ii) proportion of missing data < 0.25 for each locus. Since we used only double-simplex markers (AAAAAa × AAAAAa or Aaaaaa × Aaaaaa) for subsequent linkage analysis, further filtering was required to exclude double-duplex (AAAAaa × AAAAaa and AAaaaa × AAaaaa) and double-triplex loci (AAAaaa × AAAaaa). We then calculated the AAFs for each locus. As expected, the distribution pattern of the AAFs exhibited six peaks, with values of 0.167 (= 1/6), 0.333 (= 2/6), 0.500 (= 3/6), 0.667 (= 4/6), 0.833 (= 5/6), and 1.000 (= 6/6) (Fig. 1). We selected 29,701 (AAAAAa × AAAAAa) and 6,889 (Aaaaaa × Aaaaaa) double-simplex loci for further analysis.
Subsequently, we determined the genotypes for each individual for the 36,590 (29,701 + 6,889) SNPs. In the "AAAAAa × AAAAAa" double-simplex SNPs, AAFs of 0.000 (AAAAAA), 0.167 (AAAAAa), and 0.333 (AAAAaa) were expected to segregate at a ratio of 1:2:1 in the S1 progeny. However, it was difficult to distinguish between the AAAAAa and AAAAaa genotypes because numbers of reads in each individual were insufficient to differentiate AAFs of 0.167 and 0.333 significantly. Therefore, we defined an AAF of 0 as indicating homozygous REF alleles and AAF > 0.000 as indicating "not homozygous" REF alleles, with an expected segregation ratio of 1:3, such as dominant loci. We applied the same strategy to the "Aaaaaa × Aaaaaa" double-simplex candidates and determined that AAF of 1.000 indicates homozygous ALT alleles, whereas AAF < 1.000 indicates "not homozygous" ALT alleles, with an expected segregation ratio of 1:3. We selected a subset of segregation data fitting the expected ratio via Chi-square tests (P ≥ 0.01).

Grouping and ordering of double-simplex SNP loci.
To construct a genetic map, we grouped 28,516 double-simplex SNPs showing a 1:3 segregation ratio in the S1 progeny using the "group" command in the OneMap program. Using the criterion LOD value = 3-10, we obtained 1-105 groups with at least 20 SNPs ( Supplementary Fig. S1). Among these, groups with a LOD value of 7 were chosen for further analysis, because, using this criterion, 28,091 SNPs (98.5%) were divided into 98 groups, a number remarkably close to the expected value of 90, i.e., the number of chromosomes in sweetpotato.
We performed locus ordering within a group with the OneMap ordering command to generate 98 LGs consisting of the 28,087 SNP loci, with a total length of 83,678 cM. The average number of mapped loci on a single LG was 286.6 loci/LG, and the average map length was 853.9 cM/LG. Since genotyping errors and missing data are frequently encountered when genotyping via a sequencing strategy 7 , 1.3% inherent errors and 2.2% missing of the 3.99 M data points (28,087 SNPs across the 142 S1 lines) data were corrected and imputed, respectively, using a map-based imputation algorithm implemented with the Maskov program 24 . The resulting error-reduced imputed data were again subjected to grouping and ordering analysis with OneMap. As a result, we obtained a genetic map consisting of 96 LGs, including 28,087 SNPs in 15,634 co-segregation bins, covering a total distance of 33,020.4 cM (Fig. 2, Table 1, Supplementary Table S3). The mean length and number of mapped SNPs of a single LG were 344.0 cM (ranging from 26.3 to 904.5 cM) and 292.6 SNPs (19-692 SNPs), respectively.
The SNPs on the genetic map were functionally annotated (Supplementary Table S4), because, simultaneously, these SNPs were physically mapped on the reference sequences of the I. trifida genome 21 , on which 62,407 genes that occupies 12.5% of the genome were predicted. A total of 24,732 SNPs (88.1%) were in gene regions, while the other 3,236 (11.5%) were in intergenic sequences. The remaining 119 (0.4%) were not assigned. Of the gene SNPs, 146 (0.5%) and 5,847 (20.8%) were putative loss-of-function and non-synonymous SNPs, respectively. The proportions of the SNP effects in the three independent libraries were similar (Supplementary Table S4).

Identification of HGs from the genetic
LGs. We reasoned that the 96 LGs belonged to 15 HGs corresponding to the basic chromosome number of sweetpotato. To validate this hypothesis, we assigned the LGs onto the genome sequence of I. trifida (ITR_r1.0). The 28,087 SNPs on the 96 LGs were positioned onto 2,889 genome scaffolds covering 236.5 Mb (46.1%) of the total length of ITR_r1.0 (513.0 Mb). Each LG was assigned to an average of 81.6 scaffolds, ranging from 7 to 157. Of the 2,889 I. trifida scaffolds, 194 were assigned to six LGs (probably belonging to the same HG), as expected, while 2,657 and 38 scaffolds were assigned to fewer and more than six LGs, respectively. Based on the scaffolds assigned to multiple LGs, the 96 LGs were grouped into HGs (Fig. 3, Supplementary Table S5), and the HGs and LGs were numbered according to map length. Eleven HGs (Ib01, Ib03, Ib04, Ib05, Ib06, Ib08, Ib09, Ib10, Ib12, Ib13, and Ib15) included six LGs, whereas three HGs (Ib02, Ib07, and Ib11) consisted of six main LGs and small extra LGs. For example, Ib14 comprises five main groups and two small groups. Therefore, the 96 LGs were successfully grouped into 15 HGs, as expected.

Discussion
To the best of our knowledge, this is the first report describing a high-density SNP linkage map for the autohexaploid sweetpotato, I. batatas, which was produced using ddRAD-Seq analysis, a high-throughput genotyping  A a a a a ]  [A a a a a a ] [a a a a a a ] strategy. In the past decade, several genetic maps for sweetpotato have been constructed based on RAPD, SSR, AFLP, and transposon-insertion polymorphism markers [15][16][17][18][19][20] . The total length of the genetic map from this study, 33,020.4 cM, was much longer than those of other reported sweetpotato maps [15][16][17][18][19][20] . We have considered three possibilities to explain the discrepancy although reason for the differences is not fully clear. The first is differences of genome coverage of the maps. Our map would cover the longest regions of the sweetpotato genome because of the highest density among the reported maps. The second is due to different software programs, mapping algorithms, map functions, ratios of the number of markers and populations size, and treatment of genotyping errors. All of them could explain the differences in map length as reported by the previous studies 25,26 . The third might be derived from residual genotyping errors. Every 1% error rate in a marker adds approximately 2 cM to the map 27 , even though substantial errors in our data were corrected with Maskov.

[A A A A A A ] [A A A A A a ] [A A A A a a ] [A A A a a a ] [A
To date, HGs in the sweetpotato genetic maps had not been identified clearly and completely due to the shortage of duplex and triplex anchor loci in the previous marker systems, which were insufficient for saturating the maps with genetic loci, since sweetpotato has an autohexaploid genome consisting of 90 chromosomes (2n = 6x = 90). In this study, we successfully performed complete identification of the 15 HGs in sweetpotato by  improving upon previous techniques as follows: (1) employing SNPs from ddRAD-Seq analysis through NGS technology to maximize genetic loci on the map, (2) using the genome sequence of I. trifida, a likely diploid ancestor of I. batatas, as a reference, which can be used to physically identify HGs, and (3) using an S1 mapping population to simplify the genetic mode for autohexaploid species. Since SNPs are the most abundant polymorphisms across genomes, their use greatly facilitates rapid genetic analysis of autopolyploid crops with large numbers of chromosomes and complex genomes, such as sugarcane (2n = 10x = 100) 28 and sweetpotato (this study). To identify SNPs, we mapped ddRAD-Seq reads from autohexaploid sweetpotato onto the genome sequence of I. trifida, a likely diploid ancestor of sweetpotato. The mapping rate of RAD-Seq reads for each progeny was approximately 70% or higher (Supplementary Table S2), indicating that the I. trifida genome has sufficient similarity to the sweetpotato genome to be used as a reference genome sequence. In addition, functions of the SNPs could be annotated using the I. trifida genome information. Approximately 90% of the SNPs were in gene regions (Supplementary Table S4), which percentage is much higher than the gene fraction of the I. trifida genome, 12.5%. As proposed in the previous study 23 , the combinations of restriction enzymes, PstI-EcoRI and PstI-MspI, employed for the ddRAD-Seq library constructions were effective to detect gene SNPs. The ddRAD-Seq maps based on the enzymes would be biologically meaningful and beneficial for functional genomics, molecular genetics, and marker-assisted selection in breeding.
In conclusion, we established the first high-density genetic map of hexaploid sweetpotato using SNP markers obtained from ddRAD-Seq analysis. This is also the first report of the identification of the complete set of 15 HGs in the sweetpotato genetic map using a physical mapping strategy. This genetic map, with a high density of SNPs, could facilitate the identification of all haplotypes of the autohexaploid genome of sweetpotato. This haplotype information would be helpful for haploid (or phased) genome sequencing in sweetpotato, meaning that it might no longer be impossible to decode the complex sweetpotato genomes completely. The molecular and genetic techniques developed in this study are also applicable to other polyploid plant species, including important crops.

Methods
Plant materials. A single sweetpotato cultivar, Xushu 18, the most popular cultivated line in China, was used in this study. The plants were artificially self-pollinated to obtain seeds to generate an S1 mapping population (n = 142) in NARO, Japan. Genomic DNA was extracted from leaves of individuals from this population, as well as those of the parental line, using a DNeasy Plant Mini Kit (Qiagen, Hilden, Germany). ddRAD-Seq analysis. Library construction and sequencing analysis of the 142 S1 lines and the parent (Xushu 18) were performed as described by Shirasawa et al. 23 . The ddRAD-Seq libraries were constructed with two combinations of restriction enzymes, PM and PE, and DNA fragments 300-900 bp in length were fractionated with BluePippin (Sage Science, Beverly, MA, USA). The nucleotide sequences of the libraries were determined on a HiSeq2000 system (Illumina) in single-read (101-base) or paired-end mode (93-base).
Data processing. Primary data processing of sequence reads was performed as described by Shirasawa et al. 23 with minor modifications. Low-quality sequences were removed and adapters were trimmed using PRINSEQ (version 0.20.4) 29  SNP mining. High-confidence SNP candidates were selected using VCFtools (version 0.1.12b) 34 with the following criteria: (i) depth of coverage ≥ 10 for each data point and (ii) proportion of missing data < 0.25 for each locus. Using the filtered VCF files, values for read depth in genotype fields in the VCF file, i.e., DP (Quality Read Depth of bases with Phred score ≥ 15), RD (Depth of reference-supporting bases), and AD (Depth of variant-supporting bases), for each position were extracted across the 142 S1 individuals, and AD to DP ratios were used to calculate AAFs for each locus. SNP sites with an AAF ≥ 0.0833 and < 0.2500 and those with an AAF ≥ 0.7500 and < 0.9167 were selected as "AAAAAa × AAAAAa" and "Aaaaaa × Aaaaaa" double-simplex candidates, respectively.
Subsequently, the genotypes of each individual were determined as described in the Results section. In short, for "AAAAAa × AAAAAa" loci, SNPs with an AAF of 0.000 and > 0.000 were determined to be homozygous REF alleles (AAAAAA) and "not homozygous" REF alleles, respectively. Likewise, for "Aaaaaa × Aaaaaa" loci, SNPs with an AAF of 1.000 were regarded as homozygous ALT alleles (aaaaaa), while the remaining SNPs with an AAF of < 1.000 were considered to be "not homozygous" ALT alleles. SNPs of homozygotes versus not homozygotes fitting a 1:3 segregation ratio based on Chi-square tests (P value of ≥1%) were selected for further linkage analysis.
Linkage analysis for map construction. The data for the selected segregating SNPs were grouped using the OneMap (version 2.0.4) 35 via the "group" command with LOD of 7 and ordered using the "order" command (parameters of n.init = 5, subset.search = "twopt", twopt.alg = "rcd", THRES = 3, draw.try = TRUE, wait = 1, and touchdown = TRUE). Missing data and incorrect genotypes were then imputed with the Maskov (version 1.0) 24 using default parameters (max error = 1; threshold = 1; column to plot = 1; data missing% = 33). The imputed data were again grouped and ordered with the OneMap using the same parameters described above. A genetic linkage map was drawn with the MapChart (version 2.2) 36 .
Data availability. Nucleotide sequence data from the ddRAD-Seq libraries are available in the DDBJ Sequence Read Archive under accession numbers DRA004836 for PM-PE, DRA004837 for PM-SR, and DRA004838 for PE-PE. Details about the SNPs and maps are available in the Sweetpotato GARDEN database (http://sweetpotato-garden.kazusa.or.jp/).