Facilitating gene editing in potato: a Single-Nucleotide Polymorphism (SNP) map of the Solanum tuberosum L. cv. Desiree genome

Genome editing is a powerful tool for plant functional genomics allowing for multiallelic targeted mutagenesis. The recent development of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR associated protein 9 (Cas9) systems for gene editing in plants allows for simple, cost-effective introduction of site-specific double-stranded DNA breaks. The nuclear genomes of a homozygous doubled-monoploid potato clone (DM) and a heterozygous diploid clone (RH) have been sequenced in 2011. However, common potato cultivars display a highly heterozygous autotetraploid genome thus complicating target design for tetra-allelic gene editing. Here, we report on the SNP physical map of the widely used Solanum tuberosum L. cv. Desiree and on the position of the diverse indels providing an essential tool for target design in genome editing approaches. We used this tool for designing a specific gRNA and successfully knocking-out a newly discovered starch synthase gene (SS6) in potato. Resequencing data are publicly available at the Sequence Read Archive of the NCBI (accession number: PRJNA507597) and will represent a valuable resource for functional genomic studies of various metabolic pathways, cell and plant physiology as well as high-throughput reverse genetics in potato.

Potato belongs to the Solanaceae family such as tomato, eggplant, pepper or tobacco. About 200 tuber-bearing potato species are classified within the subgenus Potatoe of the genus Solanum. Common potato corresponds to the subspecies S. tuberosum ssp. tuberosum that is cultivated for its starch-, vitamin-rich tubers. Potato is the fourth most important staple crop in the world with 377 million tonnes produced in 2016 thus substantially contributing to global food safety (www.fao.org). Similar to other staple crops comprising rice, wheat or maize, important research effort is put into improving potato in a climate changing context concomitant with increasing global population. This includes biofortification and yield increase, pest protection, abiotic stress adaptation as well as post-harvest conservation and processing. As an example, starch metabolism remains intensively investigated since it influences the rheological properties of processed food or non-food potato-based products 1 . In a recent study, we identified a series of unknown proteins in association with potato starch granules by shotgun mass spectrometry 2 . Among these proteins, a yet unknown isoform of starch synthase, SS6, showed very high potential to be a key enzyme of the starch biosynthetic pathway. Common potato species are autotetraploid (2n = 4x = 48) thus complicating traditional breeding. Traits of interest follow tetrasomic inheritance patterns and chromatid segregation may occur during crossing. Substantial work was performed to produce diploid clones and facilitate genetic research. In particular, this led to the genome release of a homozygous doubled-monoploid (DM) and a heterozygous diploid (RH) clone 3 . This important milestone for genetic studies and potato improvement revealed that inbreeding depression is likely related to the high genome plasticity of the species and the presence of numerous potentially deleterious mutations 3 .
A series of transgenic strategies were employed both in functional genomic studies and crop breeding comprising insertional mutagenesis with the use of transposons or Agrobacterium mediated T-DNA insertions 4 . In both cases, multiple insertions can occur within plant genomes thus leading to undesired secondary mutations. Moreover, complete gene knockout (KO) in polyploids requires plant sexual propagation for at least two 1 Univ. Lille, CNRS, UMR8576 -UGSF -Unité de Glycobiologie Structurale et Fonctionnelle, F-59000, Lille, France. 2 Univ. Lille, CNRS, USR 3290 -MSAP -Miniaturisation pour la Synthèse l'Analyse et la Protéomique, F-59000, Lille, France. 3 These authors contributed equally: François Sevestre and Maud Facon. *email: nicolas.szydlowski@univlille.fr open generations involving potential loss of valuable fixed traits. In the highly heterozygous tetraploid potato, such approach is laborious and time consuming. On the other hand, sequence-specific DNA nucleases were used to introduce double strand breaks (DSBs) that are repaired by non-homologous end joining (NHEJ) or homologous recombination (HR) 5 . Zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs) are chimeric proteins composed of a sequence-specific domain fused to a non-specific nuclease (Fok1) that have proven successful in editing the genome of different plant species comprising potato [6][7][8][9] . Application of the recently developed Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR associated protein 9 (Cas9) technology increased rapidly over the past years 10 . Contrary to TALENs and ZFNs, CRISPR/Cas9 do not require specific protein engineering and was efficient for introducing deletions, insertions and performing multiplex genome modifications of plants 11 . The CRISPR/Cas9 system involves a single monomeric nuclease and a chimeric RNA composed of a 20-bp segment conferring site specificity as well as a scaffold sequence necessary for Cas binding 11 . Proof of concept for CRISPR mutagenesis in the tetraploid potato was achieved with the use of a U6 promoter upstream of the gRNA 12,13 . In order to avoid integration of foreign DNA in the genome, strategies using transient expression of the Cas9 and the gRNA or transfection of CRISPR-Cas9 synthetic ribonucleoproteins into potato protoplasts were implemented 14,15 . Recently, the granule-bound starch synthase (GBSS) was inactivated by CRISPR-Cas9 base editing of the conserved catalytic KTGGL-encoding locus in the tetraploid potato cultivar Desiree 1 .
Regardless of the genome editing system, inducing homozygous DSBs in tetraploid potato lines requires targeting of a SNP-/insertion-/deletion-free DNA region. Haplotype diversity in potato was investigated by sequence comparison of the RH and DM genomes 3 . The latter revealed that SNPs occur every 40 bp and that indels, of an average length of 12.8 bp, arise every 394 bp. This frequency was even higher when comparing both RH haplotypes with 1 SNP per 29 bp and 1 indel per 253 bp. Furthermore, SNP-based analysis of 23 North American potato modern cultivars showed a population diversity (π) of 0.0105, which was significantly higher than previous resequencing studies of crops 16 . Therefore, determining both allelic variation of the target region (polymorphisms intrinsic to the studied potato line) and sequence divergence from the DM reference genome is essential for designing targets leading to homozygous gene editing in common tetraploid potato lines.
Here we report on the NGS-based resequencing analysis of the widely used Solanum tuberosum L. cv. Desiree. About 100 million reads were aligned on the DM reference genome prior to SNP-/Indel-calling. Combining variations within the genome of the potato cultivar Desiree (Desiree vs. Desiree) and variations between the DM reference genome and that of cv. Desiree (Desiree vs. DM), our analysis showed occurrences of 1 SNP/68 bp, 1 insertion/1173 bp and 1 deletion/1753 bp. Furthermore, we provide with a physical map of polymorphisms occurring within the assembled regions of each of the 12 Desiree chromosomes illustrating the heterogeneity of polymorphisms. In particular, polymorphism enrichment could be observed at the edges of chromosomes 7 and 10. On the other hand, variations were poorer in the central region of chromosome 7 and 12 as well as in the regions ranging from 10 to 60 Mb and 15 to 30 Mb of chromosome 1 and 3, respectively. Increasing the number of potential targets is essential to improve the success rate in tetra-allelic mutagenesis or in specific approaches such as CRISPR-driven base editing. Thus, polymorphisms intrinsic to the Desiree genome (Desiree vs. Desiree) were also investigated by filtering out divergences from the DM reference genome (Desiree vs. DM). This analysis revealed that 70% and 67% of the SNPs and indels, respectively, occurred within the genome of Desiree. Most importantly, this work represents an essential tool for high throughput functional genomic studies of the potato. This was illustrated by the knock-out mutagenesis of SS6, a recently discovered starch synthase isoform most likely involved in starch biosynthesis 2 .

Results
In order to locate SNPs, insertions and deletions occurring in a commonly used tetraploid potato cultivar, leaf genomic DNA was isolated from in vitro cultivated cv. Desiree plants prior to genome resequencing. 99,713,534 reads of 2 × 150 bp were generated and mapped on the S. tuberosum Group Phureja DM1-3 Version 4.03 Pseudomolecule genome Sequence (http://solanaceae.plantbiology.msu.edu/pgsc_download.shtml) 3 . The percentage of mapped reads was 83.9%, corresponding to a mean coverage of almost 30X. This sequencing depth was similar to previous resequencing studies and allowed for confident identification of polymorphisms in tetraploid potato plants 3,16 . Variant calling analysis identified 11,939,720 SNPs, 707,507 insertions and 470,182 deletions. These variations included local sequence divergence between cv. Desiree and the DM reference (Desiree vs DM) and sequence polymorphism occurring between haplotypes of the Desiree tetraploid genome (Desiree vs. Desiree). The occurrence and distribution of these variations as well as the different types of nucleotide substitutions are summarized in Table 1. These data show that polymorphisms occur every 59 bp on average with SNPs being the predominant type of variation (Table 1). At the whole genome level, an average occurrence of one SNP every 68 bp was observed while indels were less frequent with one insertion or deletion per 1173bp and 1753bp, respectively (Table 1). Furthermore, SNP occurrence was most prevalent on chromosome 11 with one SNP every 55bp while only one SNP per 92bp was observed on chromosome 10 ( Table 1).
The genome-wide mean transition (A/G or T/C) to transversion (A/C, T/A, C/G or G/T) ratio was 1.603, which was consistent with a previous work 17 . Both types of transition (30.8% of the total SNPs) appeared at a similar frequency, with 3,568,987 A/G and 3,565,174 C/T transitions at the genome scale. Conversely, transversion occurrence varied according the nature of the substitution. T/A was the most frequent modification with 1598,132 events at the genome scale (13.8% of the total SNPs), while G/C transversions occurred 616,675 times (5.3% of the total SNPs). A/C and G/T transversions were present at intermediate and very similar frequencies with 1,118,093 and 1,118,478 events at the genome scale (9.65% of the total SNPs, respectively). In order to validate these data, the SNPs identified in this study where compared to the SolCAP Infinium High Confidence SNPs available at the PGSC website (http://solanaceae.plantbiology.msu.edu/pgsc_download.shtml). These data were produced by RNA-sequencing of Atlantic, Premier Russet, and Snowden potato cultivars. Our results www.nature.com/scientificreports www.nature.com/scientificreports/ corroborated 40501 of the 70326 SNPs present in the SolCAP database (Table 2). At the chromosome scale, the percentage of retrieved SNPs ranged from 45 to 72% for chromosome 2 and 6, respectively. Variants intrinsic to the Desiree genome were evaluated by filtering out the polymorphisms between cv. Desiree and the DM reference (Desiree vs. DM) that were homozygous in cv. Desiree (Desiree vs. Desiree) ( Table 3). 7,768,568 SNPs and 713,986 indels were identified at the scale of the genome. The proportions of retrieved polymorphisms compared to the global analysis were 70% and 67% for SNPs and indels, respectively, at the scale of the genome thus confirming the strong intrinsic heterozygosity of the cultivar (Table 3). At the chromosomic scale, the proportions of retrieved SNPs ranged from 51% to 81% on chromosome 5 and 3, respectively, while retrieved indels ranged from 57% to 74% on the same chromosomes.

Chromosome length (Mbp) Deletions Insertions SNPs
Occurrence (1 variation in every × bp) Transition Transversion www.nature.com/scientificreports www.nature.com/scientificreports/ To investigate the local distribution of polymorphisms, a physical map of the 12 potato chromosomes was constructed based on the DM reference genome (Fig. 1). Variant calling data were processed with the use of the "lattice" and "Sushi" R packages to illustrate polymorphism and gene densities (Fig. 1). As expected, gene density was greater at chromosome edges when compared to centromere regions (Fig. 1) 3 . Although not correlated at the scale of individual chromosomes, SNP and indel densities followed similar local distribution patterns (Fig. 1). Furthermore, polymorphism density was very low in some regions including the central part of chromosome 7 and 12 as well as in the regions ranging from 10 to 60 Mb and 15 to 30 Mb of chromosome 1 and 3, respectively (Fig. 1). On the other hand, SNP enrichment could be observed in some regions, especially at the edges of chromosome 7 and 10 ( Fig. 1). Moreover, the global density of variations was correlated with gene density, particularly in chromosome 1, 4, and 12.

Deletion Insertion SNP A/G C/T A/C T/A C/G G/T
In order to illustrate the efficiency of knowing which DNA regions should be avoided when designing gRNAs for tetra-allelic gene knockout in tetraploid potato, we used the data reported here to mutate the SS6 gene in S. tuberosum, cv. Desiree (Fig. 2). A target region (GT1), displaying good predicted efficiency and specificity for CRISPR/Cas9 gene inactivation, was identified in a polymorphism free sequence at the end of the second exon of the SS6 gene (Fig. 2a). The corresponding gRNA was cloned into pDIRECT_22a by golden gate reaction 18 . The latter construct was used for Agrobacterium-mediated transformation of potato explants and fourteen regenerated plant transformants were selected by PCR amplification of the Cas9 gene present on pDIRECT_22a. A second PCR amplification of the targeted region of SS6 was followed by Sanger sequencing and electrophoregram fitting analysis with the use of the ICE tool (https://ice.synthego.com/) (Fig. 2b,c). Among the fourteen transformed plants, seven (SS6-GT1-1 to SS6-GT1-7) showed a coefficient of determination (R 2 ) above 0.9 accompanied by KO (knock-out) -scores ranging from 89 to 99 (Fig. 2b). The seven other transformants (SS6-GT1-8 to SS6-GT1-14) displayed a R 2 ranging from 0.52 to 0.82 and KO-scores between 31 and 72 (Fig. 2b). While the probability that lines SS6-GT1-1 to SS6-GT1-7 are knock out mutants is very high, only SS6-GT1-1 harbored the same mutation on each of the four SS6 alleles resulting in a clear non-superimposed electrophoregram with a 4bp deletion at the site of predicted DSB (Fig. 2b,c). On the other hand, lines SS6-GT1-2 to -7 carried a mix of predicted indels of one or two bp in length (Fig. 2b).

Discussion
Regardless of the genome editing strategy, targeting a polymorphism-free DNA region is crucial for generating tetra-allelic mutations in heterozygous polyploids. In functional genomic approaches, target design is commonly performed based on the released genome sequence of the organism. The available sequence of the potato genome was obtained from a homozygous doubled-monoploid (DM) and a heterozygous diploid (RH) clone. In the present study, we resequenced that of the laboratory cultivar Desiree which was historically chosen for its ease of transformation and is still widely used for biomolecular studies of the potato. Variant calling analysis showed occurrences of 1 SNP/68 bp, 1 insertion/1173 bp and 1 deletion/1753 bp. These variations comprised both sequence divergences from the reference genome (Desiree vs. DM) and the variability intrinsic to the Desiree genome (Desiree vs. Desiree). Sequence variations between Desiree and DM genomes have to be taken into account for determining DNA binding sequences in gene editing approaches. However, when these polymorphisms are homozygous within the studied potato line, sequences overlapping the corresponding positions can be designed as targets for homozygous gene editing. Thus, we filtered out the polymorphisms between cv. Desiree and the DM reference (Desiree vs. DM) that were homozygous in cv. Desiree (Desiree vs. Desiree). 70% and 67% for SNPs and indels, respectively, were retrieved in this analysis indicating that particular attention must be paid  www.nature.com/scientificreports www.nature.com/scientificreports/ during target design for multiallelic gene editing of potato. For instance, inducing DSBs with the use of CRISPR/ Cas9 requires designing a ≈20bp guide sequence (gRNA) to pair with target DNA. The presence of SNPs or Indels within the targeted sequence alters the binding of gRNAs thus misbalancing the on-/off-target Cas9 activity. Individual gene sequencing can be performed to overcome this issue. However, this approach is time consuming and not compatible with high throughput functional genomics.
Here, we used these data to select a target region free of variants for inactivating the SS6 gene by CRISPR/ Cas9. Among fourteen transformed plants, seven showed a very high probability of knock-out mutation on each of the four SS6 alleles. Moreover, one mutant line contained identical deletions on each of the four chromosomes. Further investigation of the starch structure and morphology from these mutant lines will reveal the function of SS6 that remains to be deciphered. Furthermore, the inactivation of this enzyme may lead to modifications of starch properties potentially resulting in industrial applications. The data reported here will allow improving the design workflows for gene editing of the routinely used cultivar Desiree. Potato is the fourth most important staple crop in the world implying important breeding efforts in a climate changing context concomitant with increasing global population. Thus, accelerating research on potato genetics and physiology will likely lead to key fundamental discoveries as well as important industrial applications. Heat maps of gene density (genes.Mb −1 ) were constructed with the use of the "lattice" (densityplot function) and "Sushi" (plotbed function) R packages and are displayed within each chromosome. SNPs, insertions and deletions were identified by variant calling on sequence alignment of the resequenced Desiree cultivar and the DM reference genome. Polymorphism density was estimated with the use of the "lattice" (densityplot function) and "Sushi" (plotbed function) R packages and plotted next to each chromosome. The x-axis corresponds to SNP (green line), insertion (black line) and deletion (red line) densities displayed in number of variations. Sample preparation and resequencing. Leaf DNA from tetraploïd Solanum tuberosum L. cv. Desiree was extracted with the use of the Nucleospin plant II kit (Macherey Nagel). A shotgun library was constructed from 100 ng of DNA using "Truseq DNA nano sample prep" (Illumina Inc.). Validation was then realized using the "DNA HS Assay kit" (Agilent). Mean size of fragments was 550 bp. The library was then sequenced using an Illumina HiSeq 2500 generating 99,713,534 paired-end reads of 2 × 150bp corresponding to a 35X mean coverage of the 844 Mb potato genome. Read quality have been analyzed using FastQC v0.11.5, giving 90.44% of reads with a Phred Q quality score >30.
Read alignment and variant calling. The following steps were performed using the Geneious 11.1.2 software (https://www.geneious.com) 20 . Poor quality data were trimmed from the ends of the paired reads with an error probability limit of 0.05. Paired reads were then mapped to the reference genome pseudomolecules using bowtie2 mapper 21 set to high sensitivity resulting in the assembly of 167,343,472 of the 199,427,068 reads on 13 contigs. Variant calling was performed on the whole assembled contigs using the Geneious variant finder tool with the following settings: variations positions must be covered by at least 8 reads with a frequency above 0.05. The variant P-value and the strand-bias P-value limits were set to 10 −5 . This first analysis combined variations within the genome of the potato cultivar Desiree (Desiree vs. Desiree) and variations between the DM reference genome and that of cv. Desiree (Desiree vs. DM). A second analysis was performed as described above except that the divergences from the DM reference genome (Desiree vs. DM) were filtered out by excluding the reference sequence in the variant calling analysis. www.nature.com/scientificreports www.nature.com/scientificreports/ target design and plasmid construction. The full CDS of StSS6, located on chromosome 7 (position 40,939,715 to 40,924,572, PGSC_DM_v4.03_pseudomolecules, http://solanaceae.plantbiology.msu. edu/) was reconstructed using the CDS of the SS6 gene from Solanum lycopersicum (accession number: Solyc07g042830.2, http://solgenomics.net) as a template. Both genomic regions were aligned in Geneious 11.1.2 and the partially annotated CDS from S. tuberosum (position 40,939,715 to 40,932,630, accession number: PGSC0003DMG402013540, http://solanaceae.plantbiology.msu.edu/) described in 22 was extended according to the corresponding gene structure of S. lycopersicum. Intron splicing signals were positioned with respect to the tomato gene and gene structure was confirmed by in silico translation of the complete CDS. The obtained protein sequence displayed 100% identity with the full length annotated SS6 protein sequence (XP_006353746.1, https:// www.ncbi.nlm.nih.gov/protein/).
The targeted sequence for the inactivation of SS6 (GT1: 5′-GCATCCATTCAAGGCAAGCA-3′) was defined to meet with a high predicted efficiency (activity score of 0.69) and specificity (score of 100%) with the use of the geneious 11.1.2 software and a frame shift probability (microscore of 67.15) using the CRISPR-P toolbox (http://crispr.hzau.edu.cn/CRISPR2/). To favor targeting of the four SS6 alleles, the absence of polymorphisms in the targeted sequence was confirmed using the variant calling data generated in this study. The gRNA was cloned into the pDIRECT_22a plasmid by golden gate reaction using the AtU6 sense gRNA (5′-GATTGCATCCATTCAAGGCAAGCA) and antisense gRNA (5′-AAACTGCTTGCCTTGAATGGATGC) oligonucleotides purchased from Eurogentec (Belgium) and according to protocol 2B from 18 . The resulting vector pDIRECT_22a:GT1 contained a kanamycin resistance gene, a plant codon-optimized Cas9 gene and the sgRNA targeting the GT1 sequence of SS6.
Agrobacterium-mediated transformation. Agrobacterium tumefaciens C58pMP90 was transformed with the pDIRECT-22a:GT1 plasmid and cultivated during 48 h at 28 °C in LB medium containing 50 mg/L rifampicin, 25 mg/L gentamicin and 50 mg/L spectinomycin. Potato (cv. Desiree) plants were cultivated on solid Murashige and Skoog (MS) medium supplemented with vitamins (Duchefa, The Netherlands) and containing 2% sucrose (medium A) with a photoperiod of 16-h light (65 µmol photons.m 2 .sec −1 )/8-h dark at 20 °C. Stem and petiole explants were cut from 4-week-old plants prior to incubation on sterile MS plates including vitamins (Duchefa, The Netherlands), 2% sucrose, 1 mg/L indole-3-acetic acid (Sigma-Aldrich, USA), 1 mg/L zeatin-riboside (Sigma-Aldrich, USA) and 1 mg/L gibberellin A3 (Sigma-Aldrich, USA) (medium B) during 24h in the same conditions of light and temperature. Explants were then dipped in a suspension of A. tumefaciens containing pDIRECT-22a-GT1 (OD 600 = 0.2) prior to incubation on medium B at 25 °C in the dark during 48 h. Explants were washed with sterile water and 1X MS medium containing 2% sucrose, 500 mg/L cefotaxime, 500 mg/L Ticarcillin/clavulanic acid and 50 mg/L kanamycin prior to be placed on solid medium B during 2 weeks under the aforementioned conditions until the apparition of calli. Explants were then subcultured every two weeks on medium C (same as medium B with a 10-fold decreased concentration of Indolo-3-acetic acid) until shoot regeneration. plant selection and genotyping. Regenerated shoots were isolated and cultivated on solid medium A. Mature leaves from 14 transformed plants were cut and genomic DNA was isolated using the cetyltrimethylammonium bromide (CTAB) method 23 . In order to confirm the integration of the plasmid, the Cas9 gene was amplified by PCR using the following primers: Cas9For: 5-TTCGGGCGTTCCAGTGATAC-3, Cas9-rev: 5-TATGGTGTGCAACGGTCTCC-3 and the DreamTaq DNA polymerase (Thermo Fisher Scientific, Waltham, Massachusetts). The following parameters were used for PCR reaction: denaturation for 2 min at 95 °C; 30 consecutive cycles at 95 °C for 1 min, 60 °C for 30 sec, 72 °C for 25 sec; and finally elongation at 72 °C for 10 min. The presence of mutations on the targeted sequence was investigated by PCR amplification using the following primers: GT1For: 5-TTCGGGCGTTCCAGTGATAC-3, GT1Rev: 5-TATGGTGTGCAACGGTCTCC-3 and the high fidelity DNA polymerase (kappa hifi, Roche, Basle, Switzerland) with the following parameters: denaturation for 2 min at 95 °C; 30 consecutive cycles at 95 °C for 1 min, 60 °C for 30 sec, 72 °C for 25 sec. The PCR products were sequenced by the Sanger method using GT1Rev: 5-TATGGTGTGCAACGGTCTCC-3 (Eurofins genomics, Ebersberg, Germany). Sequencing data were analyzed by comparison with the corresponding electrophoregram of the wild-type sequence using the ICE tool available at https://ice.synthego.com/.