Whole-genome sequencing of tetraploid potato varieties reveals different strategies for drought tolerance

Climate changes leading to increasingly longer seasonal drought periods in large parts of the world increase the necessity for breeding drought-tolerant crops. Cultivated potato (Solanum tuberosum), the third most important vegetable crop worldwide, is regarded as drought-sensitive due to its shallow root architecture. Two German tetraploid potato cultivars differing in drought tolerance and their F1-progeny were evaluated under various drought scenarios. Bulked segregant analyses were combined with whole-genome sequencing (BSA-Seq) using contrasting bulks of drought-tolerant and drought-sensitive F1-clones. Applying QTLseqr, 15 QTLs comprising 588,983 single nucleotide polymorphisms (SNPs) in 2325 genes associated with drought stress tolerance were identified. SeqSNP analyses in an association panel of 34 mostly starch potato varieties using 1–8 SNPs for each of 188 selected genes narrowed the number of candidate genes down to 10. In addition, ent-kaurene synthase B was the only gene present under QTL 10. Eight of the identified genes (StABP1, StBRI1, StKS, StLEA, StPKSP1, StPKSP2, StYAB5, and StZOG1) address plant development, the other three genes (StFATA, StHGD and StSYP) contribute to plant protection under drought stress. Allelic variation in these genes might be explored in future breeding for drought-tolerant potato varieties.


Whole-genome sequencing and SNP calling in cultivated potato
Next-generation sequencing (2 × 150 bp paired-end) was performed with 120 × genome coverage to address the genomic constitution of cultivated potatoes, which are autotetraploid (2n = 4x = 48) and highly heterozygous.Two parental potato cultivars, Euroresa and Albatros, and the contrasting bulks were sequenced resulting in 35 GB of data per sample.SNP calling was performed against the assembled diploid potato genome DM v4.03 38 .The parental cultivars Albatros (HROALB) and Euroresa (HROEUR) showed a total of 21,190,336 and 25,098,177 variants, with 18,502,085 and 19,560,602 being actual SNPs, respectively (Supplementary Table S1).The droughtsensitive bulk (HROEXASEN) and the drought-tolerant bulk (HROEXATOL) yielded 24,272,018 and 24,057,945 variants with 21,394,166 and 21,200,308 SNPs, respectively.The difference between variants and SNPs consisted of INDELs.The number of total INDELs ranged from 11.2 to 12.7% of the total variants.To verify the efficiency of SNP calling our SNP data were compared to the set of 69,011 high confidence SNPs reported by Hamilton  et al. 2011 and the SolCap array with 8303 SNPs 39,40 .Most of the reported SNPs were also identified in our work using Albatros and Euroresa (83.2% and 83.5%, respectively) confirming the high quality of our SNP calling.

Comparison of SNP allele frequencies and drought tolerance associated QTLs
Due to the high number of SNPs per cultivar ranging from 21.2 to 25 million, windows summing SNP frequencies had to be used for visualization purposes and to obtain a basic overview about the differences in SNP allele frequencies (Fig. 1D,E).Regarding the bulks, the filtering steps reduced the approximately 20 million SNPs to approximately 6 million SNPs that were used for QTL mapping by QTLseqr.Most of these SNPs represent differences towards the diploid potato genome assembly that are present in both bulks and are not relevant for the investigations.Only 588,983 SNPs correspond to differences between the drought-tolerant or the droughtsensitive bulk, which shows that most SNPs originate from differences between the diploid reference genome and the tetraploid genome of the potato cultivars.Comparing the drought-tolerant and drought-sensitive bulks from E × A 15 QTLs were detected: three QTLs on chromosomes 1, 6 and 12, two QTLs on chromosome 2 and one QTL on chromosomes 5, 7, 8 and 9, respectively (Fig. 1F).
Under all 15 QTLs together, a total of 2325 annotated genes were located showing 589,463 SNPs towards the reference genome and only 23,722 unique SNPs for either the drought-sensitive or the drought-tolerant bulk (Table 1).QTL 3 on chromosome 1 is the smallest with only 41,187 bp, but also the QTL showing the highest gene density with 292 genes per Mbp.The longest QTL is QTL 6 on chromosome 5 with a length of 8,263,689 bp and 753 genes.QTL 10 on chromosome 7 spanning 63,682 bp covers only a single gene encoding an ent-kaurene synthase 9.All QTLs together spanned a total of 27.2 Mbp (3.24%) of the potato genome.
Comparing all filtered SNPs present in the parents and in one of either bulks, showed that regarding the whole genome the cultivar Euroresa adds more SNPs to the bulks than the cultivar Albatros (Fig. 2A,B).For the drought-tolerant bulk 38.97% of the SNPs originated from Albatros and 56.70% came from Euroresa.Only 3.38% of SNPs were present in both parents and the drought-tolerant bulk.
For the drought-sensitive bulk, the percentages were very similar: 38.42% of the SNPs were present in Albatros and the drought-sensitive bulk, 57.39% of the SNPs were present in Euroresa and the drought-sensitive bulk.Only 3.22% were present in all three.For the individual QTL the situation looked different.The two QTLs with www.nature.com/scientificreports/SNPs, the most off balance towards one parent, were QTL 1 and QTL 5.For QTL 1, 81.06% of SNPs present in the drought-sensitive bulk and 82.20% of the SNPs in the drought-tolerant bulk originated from Albatros (Fig. 2C,D).On the other hand, only 17.60% of the SNPs for the drought-sensitive bulk and 16.16% of the SNPs for the drought-tolerant bulk came from Euroresa.For QTL 5, 60.38% of the SNPs in the drought-sensitive bulk and 69.15% in the drought-tolerant bulk came from Euroresa, whereas only 39.32% of the SNPs present in the drought-sensitive bulk and 28.61% in the drought-tolerant bulk originated from Albatros (Fig. 2E,F).Differences in the SNP distribution over the whole genome are also visible between Albatros and Euroresa (Fig. 3A,B).

Association studies using SNP variations detected by SeqSNP analyses
For verification, SNPs tagging candidate genes underlying the 15 identified QTL regions were selected in two rounds.In the first round, only SNPs in genes for drought tolerance described in the literature (coding sequence plus 2000 bp 5′-UTR and 500 bp 3′-UTR) were considered.SNPs had to be exclusively present in either the drought-tolerant or the drought-sensitive bulk.To obtain a higher coverage of candidate genes that could be analyzed, a maximum of 2-3 SNPs per gene were selected for SeqSNP analyses.This resulted in 449 SNPs corresponding to 206 genes.For the second round, only SNPs leading to missense or nonsense mutations between the bulks were selected, but all 2325 annotated genes under the 15 QTL were included, which resulted in an additional 585 SNPs for another 120 genes.In total, 1034 SNPs representing 324 candidate genes were obtained by this combined strategy to conduct further association studies for drought tolerance in a panel of 34 mostly German starch potato cultivars.The 324 candidate genes for drought tolerance are distributed under all 15 QTL regions.However, heterozygosity and complexity of the tetraploid potato genome only allowed an oligonucleotide design for 410 SNPs (39.6%, Supplementary Table S2) to go into the SeqSNP analysis, which represented 188 candidate genes (Fig. 3C).
Performing Kruskal-Wallis's analysis instead of the exact Fisher test, four of the genes (Soltu.DM.06G033680.1,Soltu.DM.12G029920.1,Soltu.DM.12G026680.1 and Soltu.DM.12G026450.1)having the highest impact on the phenotypic variance η 2 were confirmed (Supplementary Table S3), whereas no association was detected for the two other genes Soltu.DM.06G034890.1 and Soltu.DM.12G026420.1.However, four additional SNPs were significantly associated with drought tolerance tagging two additional genes (Soltu.DM.12G026610.1 and (Soltu.DM.12G027240.1)under QTL 14 on chromosome 12 and two genes (Soltu.DM.02G024960.1 and Soltu.DM02G025020.1)on chromosome 2, both located under QTL4.The p-values were obtained from the SeqSNP analyses.Gene ID, position, type of mutation and gene name are given for each significantly associated SNP.SNPs were remapped to the genome assembly and annotation v6.1 for S. tuberosum Group Phureja DM 1-3 516 R44 38 .Hence, locations and gene IDs are according to the v6.1 genome annotation.The corresponding position in the former v4.3 genome annotation is given in brackets.Targeted genotyping by sequencing identified seven SNPs located in six genes that were significantly associated with drought tolerance via the exact Fisher test (Table 2, Supplementary Table S3).Comparing the whole gene sequences with the diploid reference genome assembly SolTub v6.0 revealed further differences specific for Albatros and Euroresa.The acyl-ACP thioesterase A, Soltu.DM.06G033680.1 (StFATA ), under QTL 9 on chromosome 6 was tagged by SNP Soltu.DM.06G033680.1_SNP5 at position 57,938,362 bp, which is significantly associated with drought tolerance (p-value = 0.0184).This SNP leads to a missense mutation Asp > Asn in the variety Albatros (Fig. 4A).The acyl-ACP thioesterase at  4B).Five other SNPs represent mutations in exon regions of StABP1, two resulting in missense mutations.The first missense mutation is located at 58,820,187 bp resulting in a conservative replacement Phe > Tyr.However, the second missense mutation located at 58,821,105 bp represents a radical replacement Thr > Ile.Two insertions at 58,818,289 bp and 58,820,872 bp, the former in the 5' UTR in Albatros and the drought-sensitive bulk and the latter in intron 4 in Euroresa and the drought-tolerant bulk were located next to the missense mutations.An additional deletion of the sequence ACG CTA GAC CCC occurred at 58,818,560 bp.Fifteen of the 23 SNPs originated from Albatros and 14 from Euroresa.Ten SNPs were only present in the drought-tolerant bulk, and 13 were present in the drought-sensitive bulk.Both missense mutations coming from Euroresa are only present in the drought-tolerant bulk, while the missense mutation at 58,820,187 bp originating from Albatros can be found in both contrasting bulks.The remaining SNPs resulted in silent mutations.
The gene encoding homogentisate 1,2-dioxygenase (StHGD, Soltu.DM.12G029920.1), on chromosome 12 in QTL 13, is involved in tyrosine breakdown.The homologue in Arabidopsis is AT5G54080.StHGD was targeted by the significant drought tolerance-associated SNP Soltu.DM.12G029920.1_SNP4(p-value = 0.01492) at position 59,287,789 in the coding region of the gene (Fig. 4C).This SNP results in a conservative missense mutation Gly > Val present in Albatros and the drought-sensitive bulk.In total, five SNPs are located in the coding region of StHGD, and two represent missense mutations.The first tagged the gene, and the second was located at 59,284,638 bp and led to a conservative amino acid change Ser > Thr in Albatros and the drought-sensitive bulk.In addition, one insertion and two deletions were present in StHGD.The insertion at 59,287,184 bp and the first deletion at 59,285,139 bp were present in Euroresa and the drought-sensitive bulk.The insertion adds a T and the deletion removes nine nucleotides (TAA CTT ATC).Both mutations were located in noncoding regions.The second deletion TA within intron 1 at 59,285,167 was present in Albatros and the drought-tolerant bulk.At the StHGD locus, 20 SNPs were identified, that fit the criteria of being present in only one of the contrasting bulks.Eight of the 20 SNPs originated from Albatros and 12 from Euroresa.Seven SNPs (five originating from Euroresa, two from Albatros) and one deletion were only present in the drought-tolerant bulk.In the droughtsensitive bulk, 13 SNPs as well as one deletion and one insertion were unique, with eight of the SNPs originating from Euroresa.The 13 SNPs resulted in only two missense mutations as described before, five mutations in the 3'UTR, six mutations in introns and one mutation in the 5'UTR.
The plant-specific transcription factor with a YABBY domain encoded by Soltu.DM.12G026680.1 (StYAB5) under QTL 14 (Chr.12) showed homology to YAB5 in Arabidopsis (AT2G26580.1) and was targeted by two SNPs that were significantly associated with drought tolerance.In total, seventy-one SNPs were present in either one of the bulks in StYAB5 (Fig. 4D).Six SNPs were located in the coding region, and four of them caused missense mutations.These missense mutations are located at 56,595,979 bp (Met > Arg), 56,597,336 bp (Arg > Met), 56,598,724 bp (Asn > Asp) and 56,600,893 bp (Thr > Ala).Three of the mutations represent radical replacements, while Asn > Asp can be regarded as a conservative replacement.In addition, a total of 13 deletions and eleven insertions were located in StYAB5.The longest deletion removed AAA CTC TAGAG from the sequence at position 56,596,989 bp.The longest insertion, on the other hand, was located at 56,599,076 bp and added eight thymidines to the sequence.Of all 71 SNPs, 52 originated from Albatros and 19 from Euroresa.The drought-tolerant bulk was characterized by 19 SNPs and the drought-sensitive bulk by 52.The two significant drought tolerance-associated SNPs Soltu.DM.12G026680.1_SNP1(at position 56,600,575 bp, p = 0.003859) and Soltu.DM.12G026680.1_SNP2(at position 56,600,071 bp, p = 0.00041) both represent point mutations in intron 6 of StYAB5.Both SNPs were only present in Albatros and the drought-sensitive bulk.
The genes Soltu.DM.12G026420.1 and Soltu.DM.12G026450.1 encoding protein kinase superfamily proteins (StPKSP1 and StPKSP2) are arranged in tandem array on chromosome 12 (QTL 14), with homology to AT4G31170.Both genes were targeted by one significantly associated SNP each, but the SNPs differed between the two genes (Fig. 4E).All five SNPs, present in the two duplicated genes originated from   Furthermore, Soltu.DM.07G028660.1 (StKS), encoding ent-kaurene synthase B, was also defined as a candidate gene, because it was the only gene present in the narrow QTL 10 on chromosome 7.The homologue in Arabidopsis is AT1G79460.Ent-kaurene synthase B is located at 57,590,958 to 57,597,303 bp (+ strand).In total, 19 SNPs were detected in Soltu.DM.07G028660.1, five of which were in the coding region with a single radical missense mutation due to a C > T exchange at position 57,591,796, changing Ala > Thr (Fig. 4F).In addition, there were two deletions located in noncoding regions in ent-kaurene synthase B. One was present in Albatros and the drought-tolerant bulk at 57,591,066 bp in the 5'UTR, deleting AAGA, and the other deletion was present in Euroresa and the drought-tolerant bulk at 57,594,426 bp in the 3'UTR, deleting a single guanidine.All 19 SNPs from Albatros were only present in the drought-tolerant bulk.
For the four additional genes identified by the Kruskal-Wallis test, the significantly with drought tolerance associated SNPs as well as the comparison of the gene sequences of Albatros and Euroresa are given in Supplementary Table S3 and Supplementary Fig. S1.

Discussion
Not all genetic variation present in the tetraploid potato genome can be displayed by the double monoploid reference genome 38 .Twenty to 25 million SNPs were called in comparison to the diploid reference genome, while the number of filtered SNPs present in one of the bulks comprised only a total of 588,983.A tetraploid reference genome would considerably decrease the amount of data that has to be analysed without losing relevant information.The combination of BSA-Seq for QTL analyses with SeqSNP data for association studies proved to be very efficient in the identification of eleven candidate genes significantly associated with drought tolerance in potato (Supplementary Table S4).However, these genes might not be the only genes relevant for the QTL, as neighbouring genes could also be involved depending on the linkage disequilibrium.Even though Albatros represents the drought-tolerant parent, the results clearly show that Euroresa might also contribute alleles relevant for drought tolerance and vice versa.The identified QTLs for drought tolerance in this study partially show overlaps with QTLs for drought response and drought tolerance in previous publications (Supplementary Table S5) 27,41,42 .
Recently, SeqSNP and KASP analyses proved to be very efficient and cost effective in genotyping South African potato cultivars 43 .In our study, ten genes were tagged by SeqSNP analyses as candidates for drought tolerance, and an eleventh gene was identified as the only gene under QTL 10 on Chr. 7. The acyl-ACP thioesterase A (StFATA , Soltu.DM.06G033680.1) was tagged by a SNP representing drought-sensitive allele(s) as the SNP, even though coming from Albatros, was exclusively present in the drought-sensitive bulk.However, four other SNPs specific for Albatros were only present in the drought-tolerant bulk indicating additional drought-tolerant allele(s) in Albatros.The drought-sensitive allele(s) carried two missense mutations Lys > Arg and Asp > Asn.Two classes of acyl-ACP thioesterases, FATA and FATB, can be distinguished 44 .FATA thioesterases prefer unsaturated fatty acids such as 18:1 in vitro.In tomato, a 1.7 times higher FATA expression level under drought stress resulted in an increase in phospholipids as well as remodeling of phospholipids leading to higher membrane stability and better protection of seeds from desiccation 45,46 .FATB thioesterase (Arabidopsis homolog AT1G08510) is part of cuticular wax biosynthesis 47 .In A. thaliana, wax synthesis increased up to 75% under water deficit 48,49 .Poplars overexpressing Acyl-ACP thioesterase B showed better stress tolerance 50 .
For ER auxin binding protein 1 (StABP1, Soltu.DM.06G034890.1),SNPs from both parents were present in both bulks.ABP1 functions as an auxin receptor, that is involved in signal transduction under abiotic stress 51 .Overexpression of ABP1 leads to increased K + intake into guard cells and stomatal closure 52 .
For ent-kaurene synthase B (StKS, Soltu.DM.07G028660.1),all SNPs from Albatros were present in the drought-tolerant bulk indicating a major contribution to drought tolerance from Albatros.Euroresa only contributed a single deletion in an intron region.As part of gibberellic acid (GA) biosynthesis, ent-kaurene synthase directly influences the GA content in plants 53 .Plants with decreased GA concentrations showed better drought tolerance by osmoregulation, inhibited canopy growth, accelerated stomatal closure, reduced xylem expansion and increased root-to-shoot ratio 54 .In rice, ent-kaurene synthase was downregulated under drought stress 55 .In Stevia rebaudiana, inhibition of ent-kaurene synthase with chlorcholine chloride increased drought tolerance 56 .
Homogentisate 1,2-dioxygenase (StHGD, Soltu.DM.12G029920.1) is involved in tyrosine and homogentisate breakdown.SNP distribution in the bulks indicated that both parents carried drought-sensitive and droughttolerant allele(s).Homogentisate can be either oxidized by HGD or used as a precursor for tocopherols 57 .Potato and sweet potato mutants with increased tocopherol content have shown higher tolerance to drought stress 58,59 .Mutations inhibiting HGD from catabolizing homogentisate could lead to higher tocopherol production and thus higher drought tolerance.In a cross-species meta-analysis of progressive drought, HGD showed upregulation under drought stress 60 .
YABBY transcription factors modulate morphogenesis, development and stress responses 61 .The small, plantspecific gene family contains five members 62,63 .StYAB5 (StYAB5, Soltu.DM.12G026680.1) was targeted by two SNPs showing significant associations with drought tolerance.Both parental varieties carry drought-sensitive and drought-tolerant allele(s).Arabidopsis yab5-1, a TILLING mutant, has smaller leaves than wild-type Columbia erecta 64,65 .YABBY genes carry two conserved domains: a zinc finger domain in the N-terminal region and a YABBY domain in the C-terminal region 66 .YABBY genes are part of regulatory processes in salt and drought stress resistance [67][68][69][70]  www.nature.com/scientificreports/MYB-binding sites, response elements for abscisic acid (ABRE), gibberellin P-Box and methyl jasmonate motives (CGTCA-and TGACG), were identified in YABBY genes 71 .Soltu.DM.12G026420.1 and Soltu.DM.12G026450.1 (StPKSP1 and StPKSP2) coding for protein kinase superfamily proteins show the highest homology to the RAF-like MAPKKK 28.Both genes were tagged by SNPs indicating a drought-sensitive allele, as the SNPs only occurred in the drought-sensitive bulk.In cotton, virus-induced gene silencing of RAF-like MAPKKK enhanced tolerance to drought and salt 72,73 .RAF-like MAPKKK 28 plays a role in embryogenesis and auxin polar transport.Inactivation in A. thaliana resulted in confused localization of the auxin transporters PIN1 and PIN7 74 .
In conclusion, eight of the identified genes (StABP1, StKS, StLEA, StPKSP1, StPKSP2, StBRI1, StYAB5, and StZOG1) address plant growth, which has to be well balanced under drought conditions.The other three genes, StFATA, StSYP and StHGD, contribute to protection under abiotic stress, addressing transpiration by playing a role in fat and wax metabolism, ROS homeostasis and protection against intense light by the production of tocopherols.SNP distribution in the contrasting bulks showed that for most of the eleven genes, both varieties contributed SNPs to both bulks indicating that both parents carried drought-sensitive as well as drought-tolerant allele(s) for the genes.For two genes, StFATA and StKS, only SNPs originating from Albatros seemed to contribute to drought tolerance.On the other hand, for StPKSP1 and StPKSP2, Albatros contributed the drought-sensitive allele(s).For StSYP, only SNPs from Euroresa contributed to the drought-tolerant bulk.Potato severely suffers under water deficit 79 .Exploiting allelic variation in the eleven identified genes might confer improved drought tolerance to potato.

Plant material and drought stress treatments
Based on the former ranking of 34 potato varieties by the drought tolerance index DRYM, Albatros represents a drought-tolerant and Euroresa a drought-sensitive variety (Supplementary Table S6) 27 .Both selected parents, Albatros (Norika, Groß Lüsewitz, Germany) and Euroresa (Europlant Pflanzenzucht, Lüneburg, Germany) represent German starch potato varieties (2n = 4x = 48, Solanum tuberosum L.) cultivated for industrial purposes with high starch contents of approximately 22% and 21%, respectively.Euroresa is medium late to late from maturity (140-160 days) and Albatros medium early (120-140 days).An F1-progeny was produced by crossing Euroresa (E) with Albatros (A) and maintained by the Max-Planck Institute of Molecular Plant Physiology, Potsdam, Germany.DNA was extracted from leaves according to the method of Doyle and Doyle 80 .
Three drought trials (big-bag trial B2, Golm poly tunnel; field trial F2, Groß Lüsewitz, rain-out shelter, and pot trial P3, JKI Groß Lüsewitz, rain-out shelter) were performed for the two German parental potato cultivars Euroresa and Albatros and the segregating F1 progeny (E × A, 100 clones).Experimental trials were performed under naturally fluctuating climate conditions in 2014 and early drought stress treatments were applied as previously described in detail 81,82 .
In the big-bag trial (B2), drought stress began at the five-leaf stage and was carried on until maturity (> BBCH 90).Drip-irrigation was used to water the plants and drought stress was achieved by prolonging the time interval between two irrigations to limit the irrigation volume to 50% of the water given to the optimally watered plants 82,83 .In the field trials (F2), potatoes were only once irrigated at the start of the experiment to enable the emergence of the plants.For the drought stress simulation, the plants growing under a shelter did not receive any further irrigation until the harvest.The control plants were watered in addition to the normal precipitation to guarantee optimal water conditions.In the pot trials (P3), drought treatment started at the three-leaf stage.The plants endured an ongoing change between drought periods and irrigation.Ten days into the drought phase plants were irrigated.The quantity was equivalent to three times the volume evaporated by potato plants when half of them showed turgor loss.In the control block, the weight loss due to evaporation was replaced daily to maintain a water capacity of 50%.Details of the irrigation, design and micrometeorological conditions were described previously 83 .Raw data are available from E!dal 84 .Subsequent data analyses were performed in SAS 9.4 (SAS Institute).Drought tolerance was assessed based on the tuber starch yield (SY), which is the product of tuber mass and starch content.To facilitate the comparison of experiments, the relative SY of each genotype (G) was normalised as the deviation of the relative starch yield from the median of the experiment (E) as given in Eq. (1).RelSY = relative starch yield.A new drought stress index DRYM (Deviation of Relative Starch Yield from Median) was then used to describe the drought tolerance because in an artificial data set this DRYM index proved superior in differentiating between drought-sensitive and drought-tolerant potato varieties independent from the yield potential 81 compared to three other usually applied drought indices as stress susceptible index (SSI) 85 , stress tolerance index (STI) or the geometric mean productivity (GMP) 86 .The DRYM was calculated as given in Eq. ( 2).

Rolling window analysis
To obtain a basic overview of the differences in SNP alleles, frequency plots were created, comparing the frequencies of the parent cultivars Albatros and Euroresa as well as the drought-tolerant and drought-sensitive bulk.Due to the high number of SNPs per cultivar ranging from 21.2 to 25 million, windows in which the SNP frequencies were summed had to be utilized for visualization purposes.WindowScanR was used to visualize differences in SNP frequencies 92 .The R package 4.2.0 offers the function to calculate statistics in sliding windows, either using rolling or position-based windows.For this study, rolling windows were used with the settings win_size = 1,000,000, win_step = 500,000 and funs = c("mean", "sd").The results were plotted with ggplot2 87 .In addition, CMplot in R 4.2.0 was used to obtain the SNP density graphs for Albatros and Euroresa.

Identifying QTLs
Separation of the segregating individuals into drought-tolerant (minor yield losses under water limiting conditions) and drought-sensitive bulks was used to define regions with significant differences in SNP allele frequency as quantitative trait loci (QTLs) associated with drought tolerance.The DRYM values of the F1 population E × A had been before plotted in R with the hist() function showing an approximately normal distribution, suggesting that the trait was quantitative (Supplementary Fig. S2).CLC Genomics Workbench 21.0 was used for the genome wide comparison of SNPs between the sequences of bulks and parent cultivars with the diploid reference genome DM v4.03 38 .Afterwards, the G′-algorithm within the R package QTLseqr v0.7.5.2 93 was applied to analyse SNP allele frequency differences.QTLseqr offers two statistical approaches for the calculation of QTLs based on SNP allele frequency differences applying NGS-BSA: QTL-seq and G′ 94 .The G′-algorithm calculates normalized G values for every SNP in a tricube smoothing window depending on their distance to the focal SNP of a smoothing window.To ease the identification of QTLs and reduce noise, the SNPs from the bulks were filtered.SNPs with reads per base pair above 360 and reads per base pair below 20 were excluded.Moreover, SNPs with allele frequencies below 10% and above 90% were excluded, and a GQ value of at least 99 was required for further calculations.The filtering step reduced the starting number of SNPs from approximately 20 million to approximately 6 million SNPs to be used as input for QTLseqr.The settings used for runGprimeAnalysis were windowSize = 1e6 and filterThreshold = 0.1.The results were plotted with a Bonferroni corrected threshold of p < 0.01.

Detailed analysis of QTLs
Scripts for the QTL analysis were written in Perl v5.32.1.This included a script to estimate sizes of QTLs, the number of genes inside a QTL and the number of SNPs in genes within a QTL.The second step of the QTL analysis was to identify possible candidate SNPs.SNPs suitable as markers for drought tolerance were identified using two different approaches.In the first round, only SNPs exclusively present in one of the two bulks for (2) DRYM GxEi = RelSY GxEi − median(relSY GxEi ) drought tolerance, either drought-tolerant or drought-sensitive, were extracted.Then, SNPs in genes described for their influence on abiotic stress response and plant drought tolerance in the literature were selected from these.Apart from the coding sequences, SNPs located in the 5′ UTR (2000 bp) and in the 3′ UTR (500 bp) were included in these analyses.In the second round, SNPs were selected by mutation type.Only SNPs resulting in missense (amino acid exchange) and nonsense (premature stop codon) mutations were selected.Candidate genes under the detected QTL identified by the SNP analyses were visualized with RIdeogram 95 .SNP distributions per QTL at the whole genome level were compared using a Perl script.The R package Vennerable was applied to visualize the data as Venn diagrams.Inputs for the comparison were unfiltered SNPs and filtered SNPs after the comparison as described previously 96 .The unfiltered SNPs were used to reduce the number of false unique SNPs per cultivar, which occurred in some cases, in which a SNP was present in both cultivars, but one SNP did not meet the filtering criteria.CLC Genomics Workbench 21.0 was used for further visualization of SNPs and genes 97 .

SeqSNP analyses of the association panel
Further verification of SNPs relevant for drought tolerance was achieved by using SeqSNP analysis of selected SNPs for association studies in a panel of 34 potato varieties.SeqSNP analyses represents a form of targeted genotyping by sequencing performed by LGC (LGC Biosearch Technologies, United Kingdom) using specifically designed probes for next generation sequencing 98 .In total, a list of 1,034 selected SNPs in a BED format file was made available to LGC giving the exact position of the SNPs in correspondence to the potato reference genome ST4.03 (http:// spuddb.uga.edu/ pgsc_ downl oad.shtml, accessed on 07.01.2024).To handle the complexity and heterozygosity of the tetraploid potato genome, 200 bp up-and downstream of targeted SNPs were required as additional information for the allele-specific design of oligonucleotides prior to the targeted sequencing.One or better two oligo probes were designed for each SNP by LGC (off-targets were not allowed, annealing temperature for primers aimed at values between 45 and 60 °C).Information for the oligonucleotides derived for the SeqSNP analyses, which can be used for future selection for drought tolerance, are given in Supplementary Tables S7 and  S8 (only for the significantly with drought tolerance associated SNPs).For the plant material, the plant sample collection kit provided by LGC was used.Between 7 and 9 leaf discs were punched out with a cutting tool and stored individually in a 96-well sample collection plate.Finally, a desiccant sachet was placed on top of the sealed tubes and the collection plate was shipped to LGC for genomic DNA extraction.Using the designed oligos, the surrounding areas of the SNPs were sequenced for all 34 cultivars ranked by the DRYM drought tolerance index 27 .Sequencing was performed on a NextSeq 500 v2 platform with 150 bp paired-end reads aiming at 200 × average raw coverage per sample and target.Illumina bcl2fastq v 2.17.1.14was used for demultiplexing of all library groups, clipping of sequencing adapters and quality trimming.Reads containing Ns and above a final length of 130 bp were removed.The quality trimmed single reads were aligned against the published diploid potato genome DM v4.03 derived from the doubled monoploid potato clone DM1-3 516 R44 38 using Bowtie2 99 , and variants were called with Freebayes v1.0.2-16 100 .Raw sequencing data, adapter clipped sequencing data and quality trimmed reads were delivered in fastq format along with FastQC reports.Alignment files were delivered in BAM format and variant call files in VCF format, as well as a spreadsheet containing all target SNPs with information about reference and alternative nucleotides for all provided samples.

Association studies for drought tolerance
Fisher's exact test was used to calculate significant associations.Hence, the association panel was divided into two groups according to the DRYM drought tolerance index: drought-tolerant (1t-17t) and drought-sensitive (18t-34t) cultivars (Supplementary Table S6) 27 .The association with drought tolerance was regarded as significant at p < 0.05.As a second method the Kruskal-Wallis test was used 101 .To perform the test the kruskal.test()function within the R package stats 4.3.2 was utilized.Afterwards the effect size η 2 was calculated using the kruskal_effsize() function from the R package rstatix 0.7.2.With the release of the DM v6.1 genome assembly significant SNPs and SNPs in candidate genes were mapped to the DM v6.1 genome assembly to obtain the SNP locations also in the new assembly 102 .For the mapping process, the SNP and its 50 bp flanking sequence in each direction were extracted from the DM v4.03 genome assembly and aligned to the DM v6.1 genome assembly with BLAST 103 .The largest structural change is present on chromosome 12, which was reversed as a whole.Figures 1D-F and 3A-C use the annotation from the DM v4.03 assembly, and Fig. 4A-F use the DM v6.1 assembly annotation.To show that most SNPs in the candidate gene originate from one specific parent, the R package Gviz was used 104 .The figures are separated in tracks, starting at the top with the genome tracks, generated with the GenomeAxisTrack() function, followed by tracks showing different features generated with the AnnotationTrack() function.The different tracks were plotted with the plotTracks() function using the settings collapse = FALSE and stacking = "dense".Only SNPs present in either one bulk or the other are shown, whereas SNPs present in both parental varieties and bulks against the diploid potato genome were ignored.For visualization of genes, gene models were used according to the DM v6.1 genome annotation 102 .

Plant ethic statement
IUCN guidelines were not applicable as no endangered wild species were included in the research.Experimental research and field studies complied with relevant institutional, national, and international guidelines and legislation.

Figure 1 .
Figure 1.Drought-tolerant and drought-sensitive bulks selected from the F1-progeny of a cross between the varieties Euroresa and Albatros used for whole-genome sequencing and QTL analyses.(A) Potato plants of the variety Albatros under drought stress (left) and under well-watered conditions (right).(B) Potato plants of the variety Euroresa under drought stress (left) and under well-watered conditions (right).(C) Box plot of the mean DRYM values of the F1-clones forming the drought-sensitive bulk and drought-tolerant bulk (significant difference at a p-value of 1.02e−12).(D) Allele frequency comparison between all SNPs from the parent cultivars Euroresa (blue) and Albatros (red).The graphs were created in R using the package ggplot2.The x-axis shows the genomic position in Mbp for every chromosome.The y-axis shows the mean allele frequency.The R package WindowScanR was used to calculate frequency means for local windows of the size of 1e 6 bp.(E)Allele frequency comparison between all SNPs from the HROEXASEN (blue) and HROEXATOL (red) bulks.The graphs were created in R using the package ggpot2.The x-axis shows the genomic position in Mbp for every chromosome.The y-axis shows the mean allele frequency.The R package WindowScanR was used to calculate frequency means for local windows of the size of 1e 6 bp.(F) Estimation of differences between the allele frequencies from the drought-tolerant and drought-sensitive bulks by using the G′-value.A threshold was set at p-value = 0.01.The differences in allele frequencies and the G′-values were calculated by using the R package QTLseqr.

Figure 2 .
Figure 2. Comparison of filtered SNPs over the whole genome and in selected QTL regions visualized in Venn diagrams by using the R package Vennerable.The red circle represents Albatros, the blue circle Euroresa and the green circle either the drought-sensitive (A) or tolerant (B) bulk.A SNP that is present in Euroresa and one of the bulks is represented by the blue area, a SNP present in Albatros and one of the bulks by the orange area.SNPs present in both parents and the bulk are represented by the green area.(C) and (D) Comparison of filtered SNPs in the genome region covered by QTL 1.The majority of SNPs under QTL 1 are contributed by Albatros.(E,F), Comparison of filtered SNPs in the genome region covered by QTL 5.The majority of SNPs under QTL 5 are contributed by Euroresa.

Figure 3 .
Figure 3. Distribution of SNPs and drought tolerance (blue) QTL in the cross E × A with locations of 188 candidate genes, which were further used for SeqSNP analyses in association studies.(A) SNP distribution over the whole genome of Albatros.(B) SNP distribution over the whole genome of Euroresa.(C) Distribution of genes used for SeqSNP analyses (black dots).Genes tagged by SNPs significantly associated with drought tolerance by the exact Fisher test are shown in red.Ent-kaurene synthase B, the only gene under QTL 10, is marked with a green dot.Locations obtained from the DM v4.03 assembly were used in this figure.

Table 2 .
Association with drought tolerance in the potato association panel and classification of candidate SNPs.Polymorphisms (SNP ID) are shown with indication of the exact Fisher test p-value, the chromosome, the position on the chromosome, the nucleotide (Ref) for the reference genome at the position of the SNP and the alternative nucleotide (Alt) for the position of the SNP.Significance levels: p < 0.05*, p < 0.01**, p ≤ 0.001***. https://doi.org/10.1038/s41598-024-55669-3

Figure 4 .
Figure 4. SNP distribution in seven identified genes for drought tolerance in potato.(A) Visualization of SNPs located in the gene Soltu.DM.06G033680.1 (StFATA ) using the R package Gviz.The first track shows the location of StFATA (gray) according to the DM v6.1 genome annotation.The following exon track shows the exact location of the exons.The subsequent four tracks give the exact positions of SNPs, insertions and deletions in the parents Albatros and Euroresa, as well as in the drought-tolerant and droughtsensitive bulk.SNPs are shown in different colors depending on their origin: green (Albatros), purple (Euroresa) and red (present in both parents), asterisks mark significantly associated SNPs.(B) Visualization of SNPs located in the Soltu.DM.06G034890.1 gene (StABP1) using the R package Gviz.Tracks and colors are the same as used in Fig. 4A.(C) Visualization of SNPs in the Soltu.DM.12G029920.1 gene (StHGD) by using the R package Gviz.Tracks and colors are the same as used in Fig. 4A.(D) Visualization of the SNPs located in the Soltu.DM.12G026680.1 gene (StYAB5) using the R package Gviz.Tracks and colors are the same as used in Fig. 4A.(E) Visualization of SNPs located in the duplicated genes encoding protein kinase superfamily proteins by using the R package Gviz.Soltu.DM.12G026420.1 (left, StPKSP1) and Soltu.DM.12G026450.1 (right, StPKSP2).Tracks and colors are the same as used in Fig. 4A.(F) Visualization of the SNPs located in the Soltu.DM.07G028660.1 gene (StKS) by using the R package Gviz.Tracks and colors are the same as used in Fig. 4A.

Table 1 .
Overview of all identified drought stress associated QTLs in Euroresa × Albatros, showing the QTL ID, the chromosome on which the QTL is located, the region of the QTL on the chromosome, the number of genes under the QTL and the number of SNPs for each QTL.Genomic locations are given according to reference version DM v4.03.

SNP ID Chromosome Position Ref Alt p-value Type Gene ID Gene name
Vol.:(0123456789) Scientific Reports | (2024) 14:5476 | https://doi.org/10.1038/s41598-024-55669-3www.nature.com/scientificreports/Mutations in seven identified candidate genes for drought tolerance in tetraploid potatoes 57,933,541 to 57,938,992 bp (+ strand) has the highest homology to AT4G13050.1 (FATA2) encoding an oleoyl-acyl-carrier protein hydrolase (acyl-ACP thioesterase) in Arabidopsis thaliana.Twenty-eight SNPs are located within StFATA , which apart from two are present in either one of the bulks.Six of the SNPs are located in the coding region, including three missense mutations (Asp > Asn at 57,938,362 bp, Arg > Lys at 57,933,824 bp and Lys > Arg at 57,933,938 bp).The most influential structural changes next to the missense mutations are two deletions at 57,936,818 bp and 57,937,512 bp, deleting ACA and TT, respectively.One insertion at 57,936,803 bp, adds a T in intron 4. In StFATA , 27 of 28 SNPs originated from Albatros and only one SNP at 57,934,172 bp compared to the diploid potato genome was present in both parents.Twenty-two SNPs were present in the drought-sensitive bulk, and four SNPs from Albatros were only present in the drought-tolerant bulk.The single SNP occurring in Euroresa at position 57,934,172 contributed a silent mutation in the drought-sensitive bulk.The described two deletions and the one insertion were only present in Albatros and the drought-sensitive bulk.The gene coding for ER auxin binding protein 1 (StABP1), also located under QTL 9, was targeted by Soltu.DM.06G034890.1_SNP2(p = 0.01669).StABP1 starts at position 58,822,497 and ends at 58,818,364 on chromosome 6 (+ strand) and is homologous to AT4G02980.1.In total, 23 SNPs were detected in StABP1.The significantly associated SNP (p-value = 0.01669) that targeted StABP1 represents an intron variant at 58,819,351 bp (Fig. Albatros and were exclusively present in the drought-sensitive bulk: one in Soltu.DM.12G026420.1 and four in Soltu.DM.12G026450.1.Two SNPs were located in the coding region of Soltu.DM.12G026450.1, but only one resulted in a missense mutation with a radical amino acid exchange Pro > Ser at position 56,390,021 bp.Both significantly associated SNPs were located in the 3'-UTR-region of the genes: Soltu.DM.12G026420.1_SNP1 (p-value = 0.01336) at position 56,378,682 bp and Soltu.DM.12G026450.1_SNP1(p-value = 0.02551) at 56,390,498 bp.