Polyploids can be classified based on the meiotic pairing dynamics of their duplicated sets of chromosomes. Allopolyploids are formed by the genomes of more than one ancestor species, which generally implies disomic inheritance, where bivalent and preferential chromosome pairing is observed. Autopolyploids are formed by the genome duplication of a single ancestor species, such that polysomic inheritance is expected, with both bivalent and multivalent random chromosome pairing (Comai 2005; Zielinski and Scheid 2012). In autopolyploids, multivalent formation is one of the conditions for double reduction, a phenomenon in which sister chromatid segments end up in the same gamete (Huang et al. 2019). In order to investigate these complicated inheritance patterns, genetic mapping studies in autopolyploid species still need to deal with the molecular marker system limitation. Even though up to four different alleles (e.g., abcd) are expected in a locus of an outcrossing, autotetraploid individual, biallelic single nucleotide polymorphisms (SNPs) can be characterized, at best, by the dosage of an arbitrary allele, say A, as nulliplex (aaaa), simplex (Aaaa), duplex (AAaa), triplex (AAAa) or quadriplex (AAAA).

Several polyploid species are agriculturally important (Bennett 2004). Particularly, potato is the third most cultivated food crop in the world after rice and wheat, with more than 17.6 million hectares cultivated worldwide in 2018, and a production exceeding 368.2 million tons (FAO 2020). Yet advancement in the breeding of new varieties is still relatively slow. ‘Russet Burbank’, which still accounts for a large amount of the acreage planted in the United States, was developed in the early 1900s (Bethke et al. 2014), and ‘Atlantic’, a major chipping variety grown in the country, was originally released in 1976 (Webb et al. 1978). One of the reasons for such a relatively slow breeding process is the complexity of the potato genome. Cultivated potato, Solanum tuberosum, is a highly heterozygous autotetraploid (2n = 4x = 48) outcrossing crop, which makes it very difficult to combine potentially beneficial characteristics through conventional breeding methods. In fact, many agronomic traits are polygenic, and show continuous distributions (Ghislain and Douches 2020), where genotype-by-environment interaction is usually important and needs to be assessed.

The genome complexity has also impacted the correct assessment of dosage from molecular markers, impeding proper linkage map construction and quantitative trait loci (QTL) mapping. Research has been conducted over the years to develop molecular techniques to facilitate potato breeding. However, since the first molecular genetic maps of potato were published (reviewed by Mann et al. 2011), very few breeding programs have begun to employ marker-assisted selection routinely (Slater et al. 2014). Some of the exceptions to this are for genetically simpler traits such as potato cyst nematode resistance (Schultz et al. 2012) and potato virus Y (Hämäläinen et al. 1997), and other disease resistance loci (Ramakrishnan et al. 2015). Genomics-assisted breeding for more complex traits such as yield is still elusive, and more studies on the genetic architecture of these traits as well as on genome-wide prediction (Endelman et al. 2018) are needed.

The early linkage maps constructed for potato have incorporated qualitatively scored molecular markers (Mann et al. 2011). However, this traditional presence-absence based scoring limited not only the ability to explore the full range of segregating markers, but also to build integrated genetic maps for full-sib populations (Luo et al. 2004). Advances in molecular technologies have been made available to breeders with the publication of the diploid potato genome sequence (Xu et al. 2011), and the subsequent development and release of the Illumina Infinium® 8303 Potato Array (Felcher et al. 2012) through the USDA-NIFA Solanaceae Coordinated Agricultural Project (SolCAP). The original potato array, which was subsequently upgraded to a 12k form, was made up of 8303 SNPs and has been shown to give very good coverage of the published potato genome (Felcher et al. 2012; Sharma et al. 2013). Utilizing Illumina’s technology of dual-colored fluorescents to label the different nucleotides allows for scoring allele dosage (Schmitz Carley et al. 2017; Zych et al. 2019). Along with the development of an array that uses dosage-sensitive SNPs, new analytical methods have recently been developed to effectively utilize this information in reconstructing linkage maps (Hackett et al. 2014; Bourke et al. 2018; Mollinari and Garcia 2019) and mapping QTL (Hackett et al. 2016; Chen et al. 2018) in autotetraploid species.

Novel linkage and QTL mapping approaches (Mollinari et al. 2020; Pereira et al. 2020) have enabled the exploration of the meiotic configurations and the detection of multiple QTL for traits of interest, offering breeding opportunities and insights into the biology of polyploid species. Particularly, Mollinari et al. (2020) presented a detailed characterization of the polysomic inheritance in an even more complex autopolyploid species, the hexaploid sweetpotato, Ipomoea batatas (2n = 6x = 90). In tetraploid potato, double reduction has been documented for several mapping populations (e.g., Bourke et al. 2015), but these studies lack a comprehensive assessment of multivalent formation and preferential chromosome pairing. As haplotypes and thus the patterns of genome transmission from parents to progeny had been inferred, Pereira et al. (2020) were able to estimate additive relationships among full-sibs based on identity-by-descent along the genetic map. Such information was ultimately used in a random-effect model approach, facilitating multiple QTL mapping. In fact, single-QTL models used so far may have hindered the discovery of putative loci underlying the variation of agriculturally important traits in the species. Comparatively, multiple-QTL models have been shown to have increased detection power and to provide the statistical basis for the separation of linked QTL (Pereira et al. 2020).

The advancement of molecular technologies and analytical methods, coupled with newly collected data, have made it valuable to re-analyze the B2721 potato mapping population, derived from the ‘Atlantic’ cultivar (McCord et al. 2011a, b; Schumann et al. 2017). The aims of this study were (i) to perform SNP dosage calling and create a fully phased, integrated linkage map in order to study the meiotic configurations in the B2721 mapping population, and (ii) to carry out multiple QTL-based analyses for yield, foliage maturity, dry matter, specific gravity, skin texture and internal heat necrosis resistance evaluated over four years. This study has allowed us to infer the tetraploid inheritance mechanisms from a linkage analysis perspective, and to better understand these traits at the genetic level in order to bring breeders closer to marker-assisted selection.

Material and methods

Mapping population and field experiment

The B2721 family consisted of 156 progenies and was created by crossing ‘Atlantic’ × B1829-5. This bi-parental mapping population was chosen based on its segregation for internal heat necrosis (IHN), a non-pathogenic physiological disorder characterized by brownish spots in the tuber parenchyma (Yencho et al. 2008). ‘Atlantic’ is susceptible to IHN, whereas B1829-5, an advanced round white clone from the USDA-ARS Beltsville potato breeding program, is resistant. The B2721 population also segregates for a wide range of agronomic traits of interest, and these were our focus in this work, although IHN-related traits will also be considered. In this study, we used B2721 phenotypic data previously collected in 2006, 2007, and 2008 (McCord et al. 2011b, 2011a) as well as additional data collected in 2014 (Schumann et al. 2017).

The field designs for years 2006, 2007, and 2008 were described by McCord et al. (2011a, b). Briefly, in 2006, the population was planted in an unreplicated trial with six plants per clone, whereas in 2007 and 2008, the population was planted in two replications with 10 plants per plot in a randomized complete block design, the same design also used in 2014. All experiments were carried out at the Tidewater Research Station (35°52’20” N, 76°39’33” W) in Plymouth, North Carolina, in a Portsmouth fine sandy loam. In 2014, untreated seed pieces were planted on March 21 and the harvest date was July 15 (a total of 116 days). Standard crop production methods used for potato cultivation in eastern North Carolina were used for all experiments. At harvest, plots were dug using a chain digger and harvested by hand. Tubers were then washed, culled for excessive rots and defects, and phenotype measurements were obtained.

Phenotypic data and analyses

Phenotyping for all IHN-related and agronomic traits in 2006, 2007, and 2008 was described previously (McCord et al. 2011b). In 2014, in addition to IHN incidence (NI) and IHN severity (NS) (Schumann et al. 2017), the B2721 population was evaluated for five other traits: plant yield (PY), foliage maturity (FM), dry matter (DM), specific gravity (SG) and skin texture (ST). PY was measured as total plot weight minus the tubers culled divided by 10 (number of plants per plot) in kg/plant. FM was measured as the area under the senescence progress curve (Shaner and Finney 1977), based on foliage ratings taken on June 9, June 25, and July 2, using a scale of 0 (0% yellowing) to 5 (100% senesced) with half-point increments. For DM determinations, tubers were quartered from stem end to bud end using an Easy Wedger, Model N55550-4 (NEMCO Inc., Hicksville OH). For each plot, one quarter from four different tubers was placed into plastic Whirl-Pak® bags. Samples were then weighed, frozen at –20 °C, lyophilized, and weighed again. DM was calculated as the proportion between dry and fresh weights for each sample, in percentage. SG was calculated using the formula \(SG = w_{{\mathrm{air}}}/(w_{{\mathrm{air}}} - w_{{\mathrm{water}}})\) by using the air weight (wair) and water weight (wwater) for each plot. ST was rated on a 0 (smooth) to 3 (russeted) scale. Finally, NI was calculated as the proportion of tubers showing any sign of IHN over all evaluated tubers, and NS was rated on a 9 (no IHN signs) to 1 (completely necrotic) scale and averaged out for all evaluated tubers (Schumann et al. 2017).

In order to estimate mean-basis broad-sense heritability (H2) (Holland et al. 2003), a joint analysis with data from 2007, 2008, and 2014 (replicated trials) was carried out, where years, blocks within years, genotypes, and genotype-by-year interactions were treated as random effects. Except for 2006 data (single observation), adjusted means were obtained separately for each trait-year combination based on a mixed model with genotypes and blocks as fixed and random effects, respectively, plus the random residual error using ASReml-R v. 4.1.0 (Butler et al. 2018). The phenotype means were named after a trait abbreviation followed by a two-digit year (trait-year combination, e.g., PY06) and later used for QTL mapping purposes. In addition, we fitted simple linear regressions of the form \({\boldsymbol{y}} = \boldsymbol{1}\beta _0 + \beta _1{\boldsymbol{x}} + {\boldsymbol{e}}\) using the lm() function of R (R Core Team 2020), with FM as the explanatory variable x and any other phenotype as dependent variable y from each respective year where FM was measured (i.e., 2007, 2008 and 2014). The vectors of residuals e were used as maturity-corrected phenotypes for new QTL mapping runs, as suggested by Bradshaw et al. (2004).

Progeny variation within each phenotype was explored with boxplots from R package ggplot2 (Wickham 2016). Pearson’s correlations were computed using R package psych v. 1.9.12 (Revelle 2018), and graphical visualization through network and correlation plots were obtained using R packages corrr v. 0.4.0 (Ruiz et al. 2019) and corrplot v. 0.84 (Wei et al. 2017). As an approach to investigate genotype-by-environment (GE) interaction, genotype (G) plus GE interaction (GGE) biplot analysis (Yan and Kang 2003) was carried out using R package GGEBiplots v. 0.1.1 (Dumble et al. 2017) based on the standardized means. All analyses were based on R software v. 3.6.0 (R Core Team 2020).

Genotypic data, dosage calling, and linkage map construction

Genotyping was performed using the Illumina Infinium® 8303 Potato Array (Felcher et al. 2012). The intensity of each fluorescent-labeled nucleotide was read using the Illumina iScan Reader and imported into GenomeStudio. The ordered pair of normalized intensities (x,y) from the two allelic variants was transformed into polar coordinates \(\theta = \frac{2}{\pi }{\mathrm{atan}}\left( {\frac{y}{x}} \right)\) and \(r = \sqrt {x^2 + y^2}\) to proceed with the dosage calling of the parents and offspring using the R package ClusterCall (Schmitz Carley et al. 2017). Uninformative markers as well as those leading to unclassified parents were filtered out resulting in 5599 SNPs.

Allele dosage from these markers was imported into MAPpoly v. 0.1.0 (Mollinari and Garcia 2019) together with their respective chromosome positions (S. tuberosum genome v. 4.0.3). After testing for segregation distortion (~8.9 × 10−6 using Bonferroni correction) and filtering out redundant markers, the pairwise recombination fraction of the remaining 4812 SNPs (~11.6 million pairs) were computed for all possible linkage phases and the most likely configuration was selected. Markers were assembled into 12 linkage groups (LGs) using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) algorithm. Since 91.2% of markers within the groups coincided with their respective chromosomes, groups were formed using exclusively the genome information. Markers were ordered according to the reference genome (Xu et al. 2011) and phased using function ‘est_rf_hmm_sequential’. Briefly, this function sequentially inserted markers into the map using two-point information to eliminate unlikely linkage phases (logarithm of the odds, LOD < 10.0), and for the remaining configurations, the multipoint LOD score is used (LOD < 10.0). If the insertion of a marker still resulted in less than 60 linkage phase configurations, a next marker was inserted and tested in the multiple configurations until the map is completed. In the final step, the multipoint recombination fraction was estimated allowing a global genotyping error of 5%.

Next, the conditional probabilities of the 36 possible genotypes were obtained for every cM of the linkage map for all individuals in the offspring. These probabilities were used to assess the meiotic process that formed the offspring and for further QTL analysis. Probabilistic profiles for pairing behavior were obtained for all LGs for the three possible meiotic pairing configurations, i.e., ab/cd, ac/bd, and ad/bc, where a, b, c, and d denote the homologs in parent ‘Atlantic’. The notation ab/cd indicates that homolog a paired with b, and c paired with d, for example. The same reasoning applies to parent B1829-5 with homologs e, f, g, and h. Also, the probabilistic haplotypes of all individuals in the offspring were reconstructed, and the crossing-over points and respective homologs involved in the exchange were inferred. Using this information and the heuristic algorithm presented by Mollinari et al. (2020), recombination chains were assembled for all offspring individuals, and the homologs involved in each meiosis were assessed. Recombination chains with more than two homologs involved imply that a multivalent formation was present during the meiosis.

QTL mapping and gene search

The conditional genotype probabilities calculated from the inferred genetic map were used to a compute sib-pair additive relationship based on identity-by-descent for every cM position on the map. Using the R package QTLpoly v. 0.2.0 (Pereira et al. 2020), we detected putative QTL positions using the random-effect multiple interval mapping (REMIM) model for each phenotype (trait-year combination) based on a forward-backward procedure. First, putative QTL were consecutively added to the model based on a forward search with a relatively relaxed genome-wide significance threshold (α = 0.20). Then, a backward elimination step was carried out under a more stringent threshold (α = 0.05). The variance components associated with the putative QTL were tested using linear score statistics (Qu et al. 2013), whose associated P values were used in QTL profile visualization with the logarithm of P values [\(LOP = - \log _{10}\left( P \right)\)]. The genome-wide significance was assessed using a score-based resampling method (Zou et al. 2004). A final multiple QTL model was fitted with the selected positions, and QTL genotype best linear unbiased predictions (BLUPs) were used to compute additive allele effects (Kempthorne 1955). QTL heritability (\(h_{{\mathrm{QTL}}}^2\)) was calculated as the ratio between the variance associated with the QTL and the total variance.

In order to compare different approaches, we also ran the fixed-effect interval mapping (FEIM) model (Hackett et al. 2014) as implemented in the QTLpoly package (Pereira et al. 2020). Using the same conditional genotype probabilities, a single-QTL model was used to fit six additive effects (as one additive effect from each parent is taken as reference) at every cM position. The model with a fitted QTL was compared to a null model (with no QTL) using likelihood ratio tests (LRT). LRT statistics were converted into LOD scores, and QTL were declared when the LOD score reached a threshold (α = 0.05) based on 1000 permutation tests (Churchill and Doerge 1994). In both REMIM and FEIM analyses, a window size of 20 cM was used to avoid that two closely linked positions were declared as QTL. Approximate 95% QTL support intervals were estimated by dropping 1.5 from the QTL peak LOP (REMIM) or LOD (FEIM) (Pereira et al. 2020).

Based on the QTL peaks from REMIM, we searched for candidate genes in the region delimited by markers on the left and on the right of the QTL peak or within 200 kbp each side from the QTL peak, whichever was the largest, on the S. tuberosum v. 4.03 genome (ST4.03) using the PhytoMine tool ( by Phytozome v. 12 (Goodstein et al. 2012). Finally, we used the Web Gene Ontology Annotation Plot (WEGO) v. 2.0 (Ye et al. 2018; to count and plot classifications of Gene Ontology (GO) terms for annotated genes within our QTL regions.


Trait correlations and GE interactions

Based on the phenotype means (see Supplementary File S1), variation was found for all evaluated traits over the four years of evaluation (Fig. 1A), indicating that they were segregating in the B2721 population. As the parental means were within the population phenotypic range, we could observe transgressive segregation for all trait-year combinations. PY06 has shown a quite distinct median and broader variation in comparison to the following years, which is likely because there were no replicates in 2006 so that the environmental error could not be modeled. In contrast, FM14 exhibited a narrower variation when compared to the previous years. This was rather expected since in 2014 maturity evaluation was carried out in a shorter time-window (24 days) in comparison to 2007 and 2008 (30 days). The overall change in ST14 scores, when compared to the previous years, was likely due to differences in evaluator scoring. Individuals seemed to be less affected by IHN in 2008, as NS08 and NI08 have shown distinct variations and medians in comparison to the other years. Broad-sense heritability (H2) ranged from 0.39 (DM) to 0.81 (SG), with similarly high values for PY (0.78), FM (0.70) and ST (0.79). For IHN-related traits, NS (0.74) showed a relatively higher heritability when compared to NI (0.66).

Fig. 1: B2721 mapping population phenotypes evaluated for four years (2006–8 and 2014).
figure 1

A Boxplots showing the distribution of full-sib means along with parental means. B Network plot showing positive (blue) and negative (red) significant correlations (\(|r| \ge 0.26\), P < 0.001). C GGE biplot showing the distribution of full-sibs (green) and phenotypes (blue) on the two first principal components. Traits: plant yield (PY), foliage maturity (FM), specific gravity (SG), dry matter (DM), skin texture (ST), and internal heat necrosis severity (NS) and intensity (NI). The phenotypes were named after a trait abbreviation followed by a two-digit year (e.g., PY06).

The correlation among variables was explored using a network plot (Fig. 1B), where path color and transparency represent the signal and strength of the correlation, respectively. The variable positioning is given by multidimensional scaling that leverages the magnitude of the correlations, i.e., the closer the variables, the higher the absolute values of their correlations (Ruiz et al. 2019). In general, the phenotype of the same trait evaluated in different years clustered together. As expected, NS was close to NI as well as DM to SG, thus indicating a high correlation, because each one of these trait pairs is a different way of assessing the same characteristic (IHN response and solids content, respectively). FM appeared centrally in this plot as it was correlated with most variables either negatively (e.g., with PY and NI) or positively (e.g., with SG, DM, and NS). Note that for this network plot, a subset of 144 full-sibs with complete observations among all traits was ultimately used. For correlation estimates based on pairwise complete observations, please see Supplementary Fig. S1 and Supplementary File S2. Even though data from 2006 was originated from a single replication, they were in general agreement with the adjusted means from the other years. DM observations were the least correlated ones across years in comparison to all the other trait sets (see Supplementary Fig. S1).

GGE biplot was used to investigate the existence of GE interaction (Fig. 1C), where each year was considered an environment. The distribution of full-sibs along the first two principal components, which accounted for 43% of the variation, did not show any clustering. This was expected as mapping populations usually consist of non-selected individuals. However, some full-sibs were found to be more scattered in the top-left (fourth) quadrant, indicating the most affected individuals by IHN (i.e., high NI). In contrast, more individuals were found to be less scattered in the bottom-right (second) quadrant, where NS vectors lie, indicating smaller differentiation when full-sibs were less susceptible to IHN. Relatively shorter vector lengths of PY, ST, and DM08 indicated limited capacity to discriminate genotypes using these traits, in comparison to longer vectors. Only relatively weak GE interaction was observed, since phenotypes from different years consistently exhibited the same vector orientation for each trait, with small angles between vectors. Greater angles between vectors of phenotypes from 2006 (such as DM06, SG06, and NS06) and from the remaining years should be treated carefully, as these 2006 measurements were based on single plots. The same vector orientation for SG and DM indicated that these traits were positively correlated, whereas the opposite orientations for NS and NI vectors revealed their negative correlations.

Linkage analysis and inheritance mechanisms

The normalized intensities from the Illumina Infinium® 8303 Potato Array were obtained for the B2721 population (see Supplementary File S3). The dosage calling procedure yielded 5599 informative SNPs with 1.05% of missing data. Fourteen percent of the SNPs were filtered out due to segregation distortion and redundancy resulting in 4812 informative markers: 1311 (27.3%) simplex, 1234 (25.6%) double-simplex, and 2267 (47.1%) multiplex (i.e., from duplex to quadriplex; see Supplementary Fig. S2). The complete linkage map (see Supplementary Table S1, Supplementary Fig. S3, and Supplementary File S5) consisted of 12 LGs with eight haplotypes each, with four homologs per parent. A total of 4285 markers spanned 1629.99 cM in length, with an average density of 2.64 SNPs/cM. Linkage group lengths ranged from 106.20 cM (LG 5) to 205.88 cM (LG 1), with an average LG length of 135.83 cM. The scatterplots of the physical distance in S. tuberosum genome v. 4.03 versus genetic distance in the 12 LGs in the B2721 population map is shown in Supplementary Fig. S4.

The probabilistic pairing profiles showed no preferential pairing between homologs in both parents (see Supplementary Fig. S5). Among all meiotic configurations, only 7.3% and 9.8% were inconclusive for parents ‘Atlantic’ and B1829-5, respectively. From the remaining configurations (Fig. 2), the percentage of cases with no crossing-over varied from 6.8% (LG 1, parent ‘Atlantic’) to 40.7% (LG 11, parent B1829-5), with 24.2% on average. Configuration involving two chromosomes with at least one crossing-over (i.e., bivalent configurations), were the most abundant varying from 51.0% (LG 11, parent B1829-5) to 72.4% (LG 9, parent ‘Atlantic’), with a mean of 62.3%. Multivalent configurations, i.e., involving three or four homologs, ranged from 2.2% (LG 8, parent B1829-5) to 39.2% (LG 1, parent ‘Atlantic’), with 13.5% on average.

Fig. 2: Distribution of the number of homologs involved in a recombination chain.
figure 2

Percentage of cases where no crossing-over or crossing-over involving two, three or four homologs were detected across ‘Atlantic’ (top) and B1829-5 (bottom) 12 linkage groups.

Multiple QTL mapping and candidate genes

From the REMIM analysis, a total of 21 QTL were mapped in 15 out of 27 evaluated trait-year combinations (Table 1; see Supplementary Fig. S6). One QTL was detected for 10 phenotypes each (for PY06, PY08, SG07, SG08, ST07, ST08, NS06, NS08, NI06, and NI08). The other five phenotypes have shown two (for FM07, FM14, SG06, and NI07) or three (for FM08) QTL each. No QTL were mapped for 12 phenotypes (PY07, PY14, SG14, DM06, DM07, DM08, DM14, ST06, ST14, NS07, NS14, and NI14), mostly from years 2006 and 2014. Ten QTL were mapped on LG 5, four on LG 1, three on LG 7, and one on LGs 2, 3, 4, and 9 each. QTL heritability (\(h_{{\mathrm{QTL}}}^2\)) ranged from 7.1% (FM08 on LG 7) to 64.0% (FM08 on LG 5), including five QTL with higher (\(64.0 \ge h_{{\mathrm{QTL}}}^2 \ge 36.0\)), 10 with moderate (\(29.8\% \ge h_{{\mathrm{QTL}}}^2 \ge 20.4\%\)), and the remaining six QTL with lower (\(18.6\% \ge h_{{\mathrm{QTL}}}^2 \ge 7.1\%\)) heritability (Table 1; Fig. 3). In comparison to REMIM results, FEIM detected 17 QTL from 12 phenotypes (see Supplementary Fig. S8). On the one hand, FEIM did not identify QTL previously identified for PY06 (on LG 5), FM08 (on LGs 1 and 7), SG06 (on LG 3), NI06 (on LG 1), and NI07 (on LGs 1 and 5). On the other hand, FEIM identified additional QTL for SG06 (on LGs 8 and 10), ST08 (LG 9), NI14 (on LG 5) (see Supplementary Table S4). The proportion of variance explained (PVE) by FEIM-derived QTL ranged from 13.0% to 52.7%.

Table 1 Random-effect multiple interval mapping (REMIM) for B2721 traits evaluated for four years (2006–8 and 2014).

With 10 QTL on the proximal half of LG 5, this region was characterized as a QTL hotspot. The major QTL on LG 5 for FM was detected for all three years of evaluation (2007–8 and 2014). However, LOP profiles (see Supplementary Fig. S6) showed suggestive QTL (at a lower LOP that did not reach the threshold), for instance, on LG 5 for PY06, SG14, NS14, and NI14. For the five traits with QTL on LG 5 (PY, FM, SG, NS, and NI), most QTL peaks were found at 29 cM and support intervals ranged from 0 to 50 cM. Figure. 4 depicts the additive allele effects of co-located QTL on LG 5 for these different traits evaluated in 2008. It is interesting to note that for the QTL sets PY08.1/NI08.1 and FM08.2/SG08.1/NS08.1, major allele effects appeared in the same direction within sets, but in opposite directions between them. As these QTL on LG 5 were responsible for explaining a significant portion of the variation for these traits (Table 1), the QTL-based predicted means were highly correlated, either positively (PY with NI, and FM with SG and NS) or negatively (PY/NI with FM/SG/NS) (see Supplementary Fig. S11). It is worth mentioning that selecting towards two out of four alleles per parent will tend to favor different sets of phenotypes depending on the alleles. For example, selecting allele pair ac from ‘Atlantic’ and eg from B1829-5 will likely increase PY and NI and decrease FM, SG, and NS. Additive effects for all QTL can be found in Supplementary File S6.

Due to such a large QTL effect on LG 5 for FM and its consistent impact on most traits, additional QTL detection was carried out using maturity-corrected phenotypes (i.e., virtually free of FM influence; see Supplementary Fig. S10). We found that the slopes β1 were highly significant for all traits (***P < 0.001), except DM14 (*P < 0.05), and DM08, PY14, SG14, and ST14 (P > 0.1) (see Supplementary Table S2). REMIM analyses on maturity-corrected phenotypes resulted in five QTL (see Supplementary Fig. S7). A QTL for ST08 co-located with the one previously identified on LG 4. Other four QTL were newly identified: one QTL from each SG07, SG08, and DM07 were closely located on LG 2, and another QTL for DM07 was detected on LG 6 (Table 1; Figs. 3 and 4). FEIM analyses returned eight QTL, from which two were previously identified for ST08 (LGs 4 and 9), and other six were newly detected for SG07 (LG 2), SG 08 (LGs 2 and 5), DM07 (LG 6), DM08 (LG 11), and NI14 (LG 4) (see Supplementary Table S4 and Supplementary Fig. S9).

Fig. 3: QTL identified for seven traits evaluated for four years (2006–8 and 2014).
figure 3

Dots are plotted according to QTL peak location along the linkage groups. Color scale represents P values, and sizes are proportional to QTL heritability (\(h_{{\mathrm{QTL}}}^2\)). Traits: plant yield (PY), foliage maturity (FM), specific gravity (SG), dry matter (DM), skin texture (ST), and internal heat necrosis severity (NS) and intensity (NI). * indicates maturity-corrected phenotypes.

Fig. 4: Additive allele effects for co-located QTL on LG 5 for five traits evaluated in 2008.
figure 4

Letters represent each parental haplotype from the linkage map (‘Atlantic’ = a through d, B1829-5 = e through h). Traits: plant yield (PY), foliage maturity (FM), specific gravity (SG), and internal heat necrosis severity (NS) and incidence (NI).

As the QTL support intervals comprised too large of a region to manually curate (>13 Mb per QTL, on average), we focused our candidate gene search on the region within the markers flanking the QTL peak (~400 kb, on average) (see Supplementary Table S3). A total of 533 annotated genes on the S. tuberosum v. 4.03 genome were within these QTL regions, with 342 genes assigned to at least one GO term (see Supplementary File S7 and Supplementary Fig. S12).


The recombination landscape of B2721 mapping population

The ability to assess the intricate recombination landscape of an outcrossing, autopolyploid species encompass the construction of a highly informative map. That is, the parental maps have to be integrated between the two parents and must differentiate each one of the parental haplotypes. Two genetic maps have been built previously for the B2721 population. The first map used amplified length polymorphisms (AFLP) and a few simple sequence repeat (SSR) markers, and consisted of two separate linkage maps, one for each parent, with limited integration within parents (McCord et al. 2011b). The second map used a dosage-sensitive SNP array, which led to full integration and haplotyping in the B2721 population (Schumann et al. 2017). However, this map contained 3427 SNPs and spanned 1397.86 cM, which is relatively smaller but less saturated than our current map with 4285 SNPs distributed along 1629.99 cM. The ability to incorporate more markers with MAPpoly was possibly due to (i) the multipoint-based algorithm that considers all parental homologs and marker dosages simultaneously, and (ii) the genome-assisted approach for marker ordering. Schumann et al. (2017) used the software TetraploidSNPMap (Hackett et al. 2016), which relies on (i) simplex markers to assemble homologs, bridging these homologs afterward using multidose markers, and (ii) two-point genetic distances to obtain the marker order by using the multidimensional scaling (MDS) algorithm (Preedy and Hackett 2016). Increased marker saturation provides better support to estimate the conditional probabilities, which are used to infer pairing profiles during meiosis in a segregating population and to map QTL.

The probabilistic pairing profiles showed no preferential pairing between homologs in both parents (see Supplementary Fig. S5), likely due to the cultivated potato’s autotetraploid nature. Consequently, alleles present in different homologs in the same homology group had an equal chance to recombine with alleles from other loci, amplifying the range of possible genotypes when compared to diploids or allotetraploids. In addition, most of the parental meiotic configurations based on the B2721 population were inferred as bivalents (62.3% on average) since they involved two homologs exchanging segments during the metaphase I (Fig. 2). These results are in accordance with recent findings published by Choudhary et al. (2020). These authors presented a map of meiotic stages in S. tuberosum in one diploid and three tetraploid varieties using fluorescence in situ hybridization (FISH). They concluded that bivalent chromosome associations are the most common in metaphase I of tetraploid potatoes. Likewise, the broad range of multivalent signature rates observed in our study (2.2~39.2%) was also observed by Choudhary et al. (2020) (7~48%). Similarly, as their observations differed depending on variety and chromosome, in our study, the variation occurred by parents and by linkage groups. Thus, our meiotic assessment, although not as precise as a cytological analysis and prone to sampling errors, can be very helpful, serving as a proxy to evaluate meiotic configurations by using a straightforward expansion of linkage analysis.

Multiple QTL mapping for yield, quality, and IHN-related traits

Single-QTL models were previously applied to the first two linkage maps of B2721. These models were based on an interval mapping model, and the 95th percentile (α = 0.05) of maximum LOD scores from 1000 permutation tests per phenotype was used to declare QTL, with several additional suggestive QTL (below this threshold) also being recorded (McCord et al. 2011b, a). One of the issues with the lack of integration within and between parental maps is the same region being declared as a QTL in separate parental haplotypes. For instance, for plant yield, McCord et al. (2011b) listed a total of seven QTL across three years, PY06 (one), PY07 (five), and PY08 (two), from both parental maps, in addition to 11 suggestive ones. Our multiple-QTL analyses resulted in a single co-located QTL region on LG 5 for each PY06 and PY08 (Table 1; Fig. 3), with suggestive QTL for PY07 and PY14 (see Supplementary Fig. S6). QTL on LG 5 for tuber yield have also been reported in both diploid (DM 1-3 516 R44 × RH89-039-16, n = 96) (Manrique-Carpintero et al. 2015) and tetraploid (‘Liberator’ × W4013-1, n = 110) (Rak et al. 2017) potato mapping populations.

For foliage maturity, McCord et al. (2011b) and Schumann et al. (2017) described major QTL for years 2007-8 on LG 5 as well as the one on LG 7. In the current study, another QTL on LG 1 was identified for FM08, likely due to the increased detection power of REMIM. Despite the differences regarding phenotypic distribution for FM14 (Fig. 1A), we were able to find QTL on LGs 5 and 7 (Fig. 3). This LG 5 region is well known as a main source of QTL in potato. Prior to the release of the SNP potato array, a major QTL for plant maturity had already been reported on chromosome 5, also linked to a late blight (Phytophthora infestans) resistance locus (reviewed by Danan et al. 2011). More recently, additional studies noticed that the same region on LG 5 also underlies the variation of several tuber quality-related traits in addition to resistance to early blight (Alternaria tenuis) and Verticillium wilt (Verticillium spp.) (Massa et al. 2018), and of tuber-Cd and Zn concentration (Mengist et al. 2018), for example. For IHN-related traits, three QTL were identified on LG 5, one for each NS08, NI07, and NI08, using REMIM. FEIM identified an additional significant QTL on LG 5 for NI14 (see Supplementary Table S4 and Supplementary Fig. S8). Previously, IHN-related QTL had been reported as suggestive only (Schumann et al. 2017), and this difference in significance is likely due to improved map construction.

A major QTL on LG 5 was also found to co-locate for SG07 and SG08, while it appeared only as suggestive for SG14 and with no evidence for SG06, which in turn showed one QTL on each LGs 2 and 3. The evaluation of maturity-corrected phenotypes helped to detect this co-located QTL for DM07, and SG07 and SG08 on LG 2. An alternative to perform QTL discovery free of maturity influence, though, would be to use materials that do not show much variation for this trait, i.e., crosses between either late or early maturity cultivars. Specific gravity and dry matter are supposedly related to solid content in the tuber. In fact, the correlation among their adjusted means was generally high (up to 0.81 between DM07 and SG07; see Supplementary Fig. S11 and Supplementary File S2), but the only set of co-located QTL between any SG and DM measures was that one on LG 2. It is worth mentioning that their broad-sense heritability values (H2) differed greatly (0.39 for DM and 0.81 for SG), so that the ability to detect QTL for DM may have been affected because a smaller proportion of the total variance could be attributed to genetic variance. Other mapping populations have revealed QTL on LG 5 for DM (12601ab1 × ‘Stirling’, n = 227) (Bradshaw et al. 2008) and SG (Manrique-Carpintero et al. 2015), where somewhat distinct heritability values were found (0.81 for DM and 0.46 for SG).

Skin texture was a trait that diverged the most from the other traits regarding QTL locations, specifically for not showing enough statistical evidence for QTL on LG 5 (see Supplementary Fig. S6). In diploid potatoes, skin texture is believed to be controlled by three loci (Jong 1981). In our B2721 population, we detected two QTL (one for ST07 and ST08 each), in regions on LGs 4 and 9 where no other traits have shown evidence for QTL in our study. That is, the apparent strong correlation observed between ST07 and FM07 (r = −0.44, p < 0.001), for example, could not be attributed to genetics, as correlation between QTL-based predicted means for these two traits was close to zero (r = −0.07, p = 0.40; see Supplementary Fig. S11).

When performing QTL discovery in outbred polyploid populations, in addition to genome complexity, one must keep two factors in mind. First, in F1 populations, the fact that two parental genotypes are contrasting does not directly imply that major QTL will be detected. For instance, considering a biallelic QTL, a cross between two contrasting parents, QQQQ × qqqq, would not segregate (i.e., 100% QQqq). In order to be detectable, alleles need to be contrasting within a parent, e.g., Qqqq × qqqq or Qqqq × Qqqq in the simplest cases, resulting in segregations such as 50% Qqqq: 50% qqqq or 25% QQqq: 50% Qqqq: 25% qqqq, respectively. We speculate that the allele effects of undetectable QTL towards IHN resistance, for example, did not differ significantly within each parent. For the QTL on LG 5 for NI08, the allele contributions to the population mean (~6.1%) were ±2.5% at most (Fig. 4). In other words, these effects would not change the fact that most individuals would be less susceptible (NI < 10%) as resistance could have been already conferred by these undetectable QTL.

If there are segregating QTL alleles, another factor that influences detection is small population sizes that may not allow separating signal from noise properly. Consequently, thresholds for declaring a QTL are usually required to be more stringent in order to avoid false positives. In general, tetraploid potato QTL mapping studies have been performed in relatively small population sizes (~150 or fewer individuals). In these cases, even very low P values and high QTL heritability estimates, such as those found for FM07, FM08, and SG08, should be treated carefully as they can be relatively biased due to sampling. Finally, reduced population sizes also limit the ability to test closely linked versus pleiotropy for co-located QTL across different traits or environments accurately.

Candidate genes within QTL regions

Among the candidate genes (see Supplementary File S7), GO terms that overlapped DNA and protein binding functions with the regulation of biological processes were of particular interest. In total, 19 genes were found to have the same two GO terms (GO:0003677 and GO:0006355) attributed to the S. tuberosum CYCLING DOF (DNA-binding with one finger) FACTOR 1, StCDF1 (PGSC0003DMG400018408, ST4.03ch05:4539029..4541329), which has been described as the transcription factor underlying the major QTL for maturity on LG 5 (Kloosterman et al. 2013). The SolCAP markers around the QTL peak for FM08 (c2_22959 at 4,434,048 bp and c2_23052 at 4,906,728 bp) were found to be closer to StCDF1 than the flanking markers for FM07 and FM14 (c2_11829 at 4,041,510 bp and c2_22986 at 4,279,075 bp), and the marker mentioned by Massa et al. (2015) (c2_22986 at 4,105,142). The same LG 5 region also includes yield-related QTL, where we found the Potato homeobox 1 gene, POTH1 (PGSC0003DMG400013493, ST4.03ch05:9248372..9258054), that appears to regulate tuber development in potato via StBEL5-POTH1 heterodimer (Rosin et al. 2003). Finally, QTL underlying IHN-related traits were also found in the same LG 5 hotspot, where a homeobox-associated leucine zipper, StHOX40 (PGSC0003DMG400030494, ST4.03ch05:4264626..4267698), was identified. This gene was found to be either up or downregulated under heat and drought stresses in potato, respectively (Li et al. 2019), which are believed to be two important triggers of IHN (Yencho et al. 2008).

Other genes, such as StERF6 (PGSC0003DMG400016651, ST4.03ch06:33827698..33832740) and StNAC046 (PGSC0003DMG400031266, ST4.03ch05:4946232..4948779), were characterized as transcription factors from AP2/ERF and NAC families, respectively, which have been implicated in a wide range of regulatory processes in plants (Olsen et al. 2005; Mizoi et al. 2012). A gene from the NAC family, StNAC032 (PGSC0003DMG400002824, ST4.03ch04:245125..246607), was retrieved from the LG 4 region of a major QTL for skin texture. Several genes from the NAC family, including StNAC032, were found to be expressed in the tuber skin and appeared to be involved in suberin and associated wax biosynthesis (Soler et al. 2020), which could contribute to skin texture, as well as be involved with the response to drought and other biotic and abiotic stresses (Singh et al. 2013).

In plants, a single QTL region may be found to underlie the variation of more than one phenotype at once, which can potentially explain the correlation among traits. A QTL hotspot has been identified in the short arm of chromosome 5 of potato (Bradshaw et al. 2004; Danan et al. 2011; Li et al. 2018), and our B2721 population stuck to the rule. A major QTL for maturity was found to overlap loci underlying the variation for other traits such as yield, specific gravity, and IHN-related resistance, as described previously (McCord et al. 2011a, b; Schumann et al. 2017) and corroborated with newly analyzed phenotypic data. Pleiotropy is one possible explanation of physiologically correlated traits, such as maturity, yield, and specific gravity (as a proxy of solid content, thus contributing to yield). Transcription factors are known to have this property in which a single molecular function may be involved in multiple biological processes (He and Zhang 2006). For instance, a DNA-binding element from the HAP complex, Ghd8, was found to play pleiotropic roles in regulating yield, flowering, and plant height in rice (Yan et al. 2011). On the other hand, the presence of QTL for yield-related traits and pathogenic (e.g., late blight) and non-pathogenic (e.g., IHN) potato disorders in the same hotspot on LG 5 could well be due to closely linked genes. Another example from rice has suggested the multigenic nature of a major QTL for grain yield under drought, where the central role of a NAM transcription factor, OsNAM12.1, in concert with extra co-localized genes, influenced other traits such as root and panicle branching, and transpiration efficiency (Dixit et al. 2015).

Final considerations

‘Atlantic’ has been widely adopted as a chipping variety, mostly due to its high specific gravity and yield, and in spite of its medium-late maturity and susceptibility to internal heat necrosis (Webb et al. 1978). The genetic basis of the correlation between these traits was dissected here by means of multiple QTL models in combination with a dense, fully phased genetic map. We found that not only a major QTL on LG 5 underlies the variation of these traits concomitantly, but also that the additive effects attributed to each parental haplotype appear in such a way where, in general, when an allele increases SG and FM measurements (thus towards early maturity), it decreases PY and IHN resistance (hence towards low NS and high NI), and vice-versa. StCDF1 has been pointed out as playing a central role in this major QTL for maturity, given its day-length signaling function that regulates the tuberization pathway (Kloosterman et al. 2013). In addition to low rainfall in the early season and high temperatures, longer days to harvest are known to increase IHN incidence (reviewed by Yencho et al. 2008). Similarly, a higher number of rain events seemed to contribute to the occurrence of IHN, because rainfall interferes, among other variables, with light intensity (Sterrett et al. 1991). Therefore, due to its circadian role, we believe that StCDF1 indirectly participates in IHN progression.

Our list of candidate genes can be used in a further dissection of QTL regions. Ideally, factors other than StCDF1 could be targeted in benefit of yield and specific gravity increase at the expense of late maturity. One way of doing so is to focus on linked genes (e.g., POTH1 for yield) or other non-linked QTL (e.g., the one on LG 2 for specific gravity). Moreover, the evaluation of germplasm collections in order to assess haplotypic diversity could assist the development and implementation of markers to monitor loci of interest, as previously performed for StCDF1 in chromosome 5 (Gebhardt et al. 2004; Hardigan et al. 2017). Marker-assisted breeding for quantitative, more complex traits could be implemented upon validation of these QTL neighboring markers in breeding populations of tetraploid potato. DNA markers also can be employed in genome-wide prediction in this crop (Sverrisdóttir et al. 2017; Enciso-Rodriguez et al. 2018). In this case, both meiotic and trait genetic architecture assessments derived from our analyses can be particularly useful in refining simulations of potato and other autopolyploid crop breeding programs integrating genomics-assisted breeding (Gaynor et al. 2020).