Genetic analysis of phenylpropanoids and antioxidant capacity in strawberry fruit reveals mQTL hotspots and candidate genes

Phenylpropanoids are a large class of plant secondary metabolites, which play essential roles in human health mainly associated with their antioxidant activity. Strawberry (Fragaria × ananassa) is a rich source of phytonutrients, including phenylpropanoids, which have been shown to have beneficial effects on human health. In this study, using the F. × ananassa ‘232’ × ‘1392’ F1 segregating population, we analyzed the genetic control of individual phenylpropanoid metabolites, total polyphenol content (TPC) and antioxidant capacity (TEAC) in strawberry fruit over two seasons. We have identified a total of 7, 9, and 309 quantitative trait loci (QTL) for TPC, TEAC and for 77 polar secondary metabolites, respectively. Hotspots of stable QTL for health-related antioxidant compounds were detected on linkage groups LG IV-3, LG V-2 and V-4, and LG VI-1 and VI-2, where associated markers represent useful targets for marker-assisted selection of new varieties with increased levels of antioxidant secondary compounds. Moreover, differential expression of candidate genes for major and stable mQTLs was studied in fruits of contrasting lines in important flavonoids. Our results indicate that higher expression of FaF3′H, which encodes the flavonoid 3′-hydroxylase, is associated with increased content of these important flavonoids.

Fruit and vegetable are a major component of the human diet, promoting healthy ageing by reducing risks of a wide array of chronic and degenerative diseases [1][2][3][4] . Strawberry is a crop of particular interest for a healthy diet as the fruit is a rich source of nutrients with high antioxidant activity 5 . Strawberry fruit is particularly rich in flavonoids, a subgroup of polyphenols, which are one of the most extensive and heterogenous family of secondary metabolites [6][7][8] . Flavonoids are involved in several important physiological processes, such as regulation of auxin transport, male fertility, biotic and abiotic stress responses, and fruit and flower pigmentation [9][10][11][12] . In general, flavonoids are subclassified into different families that comprise 6 subclasses according to their structure: flavonols, flavones, flavanone, flavanols or flavan-3-ols (flavan-3-ol monomers, proanthocyanidins, and theaflavins), isoflavones and anthocyanidins. Polyphenol synthesis occur through the general phenylpropanoid pathway (Fig. 1), in which aromatic amino acids (phenylalanine and tyrosine) give rise to thousands of molecules with a multiple phenol backbone by the action of enzyme superfamilies (ligases, oxygenases, oxidoreductases, transferases, etc.) 6,7 . The initial steps, catalyzed by the phenylalanine ammonia lyase (PAL), cinnamate-4-hydroxylase (C4H) and 4-coumaroyl CoA-ligase (4CL), are necessary for the formation of phenylpropanoid monomers which supply the basis for all resulting phenolic compounds 7 (Fig. 1).
During strawberry fruit development and ripening, important dynamic fluctuations in phenylpropanoid content are reported 8 . Halbwirth et al. 13 observed two activity peaks during fruit ripening for most of the enzymes involved in flavonoid biosynthesis. The first peak in immature fruit coincides with the accumulation of astringent Scientific Reports | (2020) 10:20197 | https://doi.org/10.1038/s41598-020-76946-x www.nature.com/scientificreports/ tannins (i.e. flavan-3-ols and derived proanthocyanidins), while in the later stages of strawberry ripening, a redirection of flavonoid biosynthesis is observed to flavonols and anthocyanins, in order to increase fruit attractiveness towards seed dispersal animals 8,14 . More than 25 different anthocyanins have been detected in fruits of different strawberry cultivars, although in general the pigmentation of the receptacle is due to glycosylated pelargonidins and a small fraction of glycosylated cyanidins 15 . Phenolic acids and their derivatives also show accumulation patterns depending on fruit developmental stages 8 . While hydroxybenzoic acid derivatives are the dominant class during the first stages of fruit growth, they are progressively substituted by caffeic and ferulic acid hexose derivatives and finally coumaric and sinapic acid derivatives, which are almost solely detected in the red ripe fruit 8 . In strawberry, a number of studies have analyzed polyphenols levels 14,[16][17][18] ; however, the knowledge about the genetic mechanisms underlying these processes is more limited [19][20][21] . Quantitative trait loci (QTL) mapping in bi-parental populations is a method commonly used to dissect the genetic basis of agronomic traits. In strawberry, the majority of studies on both natural variance and metabolite QTL mapping have focused on primary metabolism [22][23][24][25] . These studies have identified genomic regions underlying sugar content and titratable acidity, vitamin content and variation in other primary metabolites including organic acids and amino acids in fruits. Targeted QTL analyses have also been performed on volatile organic compounds in strawberry fruit 26,27 . Screens of natural variance have additionally focused on a similar range of compounds 14,28 . Moreover, some of these studies were able to identify candidate genes encoding biosynthetic enzymes affecting these important fruit quality traits.  34,58,59 . Characterized enzymes and transcription factors regulating biosynthetic genes are shown, as well as gene numbers according to the F. vesca reference genome v.4. ANS, anthocyanidin synthase; ANR, anthocyanidin reductase; C4H, cinnamic acid 4-hydroxylase; CHI, chalcone isomerase; CHS, chalcone synthase; 4CL, 4-coumaroyl-CoA ligase; DFR, dihydroflavonol reductase; F3H, flavanone 3-hydroxylase; F3′H, flavonoid 3′-hydroxylase; FLS, flavonol synthase; GT1, anthocyanidin glucosyltransferase; GT2, (hydroxy)cinnamic acid and (hydroxy)benzoic acid glucosyltransferase; LAR, leucoanthocyanidin reductase; PAL, phenylalanine ammonia lyase; SDH, shikimate dehydrogenase.  20,21 . In those studies, the content of 13-21 polyphenol compounds was determined and QTL controlling their variation were detected. In this study, a broader scale analysis of fruit polar secondary metabolite levels was performed in two independent harvests on the '232' × '1392' F 1 strawberry population that has been previously characterized for agronomic and fruit quality traits 22,24,26 . The objective of this work is to extend the phenotypic characterization of this population to the polyphenol composition and antioxidant capacity on fruits. First, we measured total polyphenol content (TPC) and fruit antioxidant capacity as a rapid estimation of antioxidant content in strawberry fruits. Next, we performed metabolite analysis using Ultra Performance Liquid Chromatography coupled to Tandem Mass Spectrometry (UPLC-Orbitrap-MS/MS) to tentatively identify and semi-quantify secondary metabolites in the population. A QTL mapping approach was then carried out to detect genomic regions and candidate genes involved in the biosynthesis and regulation of phenylpropanoid-derived metabolites in strawberry fruit. Results are discussed both in the context of the regulation of secondary metabolism in strawberry fruits and with respect to the efficiency of identified markers to assist the nutritional fortification of fruits during crop breeding.

Results
Variation in total polyphenol content, antioxidant capacity and content of phenylpropanoid metabolites in the '232' × '1392' population. To analyze the antioxidant content of strawberry fruits, total polyphenol content (TPC) and Trolox equivalent antioxidant capacity (TEAC) were analyzed in the whole F 1 population and the two parental lines during two consecutive years, 2013 and 2014. Parental line '1392' presented higher TPC and TEAC in both years, although the difference was only significant for TEAC in 2014 (Table 1, see sub-section "Variation in Total Polyphenol Content and Antioxidant Capacity" in the Supplementary Note for a detailed description).
To focus on the variation on individual secondary metabolites that can contribute to TPC and TEAC, we next evaluated which phenylpropanoid-related metabolite were present in ripe fruits from the two parental lines and their F 1 progeny by Ultra Performance Liquid Chromatography coupled to Tandem Mass Spectrometry (UPLC-Orbitrap-MS/MS) using the same samples previously profiled for TPC and TEAC. We were able to annotate the chemical structure of 78 metabolites, including 53 flavonoids, 14 hydroxycinnamic acid derivatives, six hydroxybenzoic acid derivatives, and five terpenoids. The flavonoid class included 21 condensed tannins or proanthocyanidins (10 propelargonidin and 11 procyanidin oligomers), six flavan-3-ols, 16 flavonols, three flavanones, six anthocyanins and one flavone ( Table 2; Supplementary Table S1).
Hierarchical cluster analysis of phenylpropanoid metabolites in the '232' × '1392' population. Figure 2 provides a hierarchical cluster analysis (HCA) from samples harvested in the 2013 and 2014 years. Similar to the observation for primary metabolites 24 , here the range in metabolite contents in the F 1 progeny was larger than that seen between the two parental lines, ranging from 0 content of metabolite in some F 1 lines to 13.74-fold change compared to the '1392' parent. As can be observed in Fig. 2, metabolites were grouped into two main clusters (A and B), illustrating biochemical relations among strawberry secondary metabolism (see sub-section "Hierarchical Cluster Analysis of Phenylpropanoid Metabolites in the '232' × '1392' Population" in Supplementary Note for detailed metabolite relations description). Table 1. Total polyphenol content (TPC) and antioxidant capacity (TEAC) in fruits of '232' , 1392' and F 1 progeny. The mean values, standard deviations (SD), range, significance level of the difference between parental lines means, and broad sense heritability (H 2 ) of the traits are described. a ns and * indicate values that are not significant or significant at P < 0.05, respectively. b Trait segregation was declared transgressive (Transg.) when at least one progeny had a value higher or lower than the highest or lowest parental value by at least the SD of the parents. c Pearson correlation between years. Heritability of phenylpropanoid metabolites. The majority of secondary metabolites showed moderate to high broad sense heritability in both years, ranging from 0 to 0.87 in 2013, and from 0.11 to 0.72 in 2014 (Table 2). High heritability values (H 2 > 0.5) for both seasons were shown for (epi)afzelechin 1, kaempferol-hexose 2, rutin 2, coumaric acid-hexose 2, ferulic acid-hexose 1, sinapic acid-hexose 2, coumaric acid 2, 1-O-protocatechuyl-beta-xylose and sesquiterpenoid hexose 1 ( Table 2). The average broad sense heritability of secondary metabolites was higher in 2013 (0.54) than in 2014 (0.35). www.nature.com/scientificreports/ formed essentially as described for detection of mQTLs for primary metabolites 24 , using the same fruit samples collected in years 2013 and 2014. Therefore, the detection of mQTLs for polar secondary metabolites extended previous characterization of QTL controlling fruit quality using the '232' × '1392' F 1 population 22,24,26 . QTLs detected slightly below the threshold by rMQM were included if a significant association between marker and trait was observed by the Kruskal-Wallis test (P > 0.005), or if a QTL was detected in the same genomic region for the same metabolite in the other analyzed year or for another related metabolite in any of the analyzed years.

Detection of QTLs for fruit antioxidant capacity and phenylpropanoid metabolites
In total, we detected 7 QTLs controlling TPC (5 in year 2013 and 2 in 2014). None of them was detected during both years ( Fig. 4; Supplementary Table S3). Similarly, 4 and 5 QTLs for TEAC were detected in 2013 and 2014, respectively, and all of them were observed only in one of the two years. Two QTLs controlling TPC, qTPC-IV-1-2013 and qTPC-IV-3-2014, were detected at approximately the same genomic regions in two homoeologous linkage groups (LGs) belonging to homoeology group IV (Fig. 4). These QTLs could therefore be considered as putative homoeo-QTLs. One QTL controlling TPC and one controlling TEAC in 2014 overlapped in LG VI-5, which might indicate the presence of a locus controlling both traits. However, the mean effects associated to each allele differed between the two QTLs (Supplementary Table S3), suggesting the presence of two linked loci with positive alleles for TPC and TEAC coming from the male and female parent, respectively.
A total of 309 mQTLs for 77 out of the 78 metabolites were detected over the two years ( Fig. 4; Supplementary  Table S3). Among them, 41 QTLs were detected in the same chromosomal regions (with overlapping confidence intervals) over the two years and were considered stable, while 227 mQTLs were detected only on 1 year. Therefore, only 15.3% of the total 268 mQTLs were stable over the two years. The majority of these stable mQTLs (36 LGs and others, suggesting the presence of loci controlling the levels of antioxidant compounds in a coordinated way (Fig. 4). See sub-section "Clusters of QTL for Phenylpropanoid Metabolites" in Supplementary Note for detailed description of QTL clusters detected for proanthocyanidins, hydroxycinnamic acid derivatives, terpenoids and (epi)afzelechin. The largest hotspots for flavonoid compounds were located on LG V-2 and V-4, and included many stable mQTLs with major effects on metabolite variance ( Fig. 4; Supplementary Table S3). On LG V-2, stable mQTLs were detected for the anthocyanins pelargonidin-hexose (the main pigment in strawberry), pelargonidin acetyl hexose and pelargonidin rutinose, for derivatives of the flavonol kaempferol and for derivatives of the flavanone eriodictyol. Similarly, another cluster of mQTLs including similar range of flavonoids was observed in LG V-4, with stable QTLs for the anthocyanins pelargonidin acetyl hexose 1 (also detected on LG V-2) and cyanidin hexose, the two isomers of the flavanol epicatechin glucuronide, for the flavonols isorhamnetin glucuronide, kaempferol hexose 1 and rutin 2, and the flavanone naringenin chalcone hexose.

Candidate genes at mQTL hotspots on LG V-2 and
LG V-4. Hotspot genomic regions with major mQTL on LG V-2 and V-4 were selected for candidate gene identification based on colocalization to the F. vesca and/or F. × ananassa 'Camarosa' genomes [29][30][31] . Several mQTLs for flavanones, flavonols, anthocyanins, flavan-3-ols and propelargonidins collocated on those two genomic regions. No reported gene encoding for any enzyme of the flavonoid pathway were located on the F. vesca reference genomic regions between markers flanking the QTL confidence intervals. However, a number of predicted genes with similarity to structural genes or transcription factors were detected. Interestingly, the transcriptional repressor FaMYB1 (FvH4_5g17120), is located within the F. vesca orthologous region to the QTL hotspot interval in LG V-4. FaMYB1 is a repressor of anthocyanin and flavonol biosynthesis in strawberry 32,33 . A search on the octoploid strawberry genome 31 identified 5 copies of FaMYB1 in the 'Camarosa' reference genome, with one copy on chromosomes Fvb5-1 (FxaC_17g22290), Fvb5-2 (FxaC_20g18010) and Fvb5-4 (FxaC_19g15290) and two copies on chromosome    www.nature.com/scientificreports/ The gene FaF3′H (FvH4_5g14010), encoding the flavonoid 3′-hydroxylase enzyme is located just outside the genomic region corresponding to the confidence intervals in LG V-2 and V-4 both in F. vesca and F. × ananassa genomes. The F3′H enzyme catalyzes the hydroxylation of flavonoids up-stream of DFR 34 (Fig. 1). There are three copies of FaF3′H in the octoploid genome, on chromosomes Fvb5-1, Fvb5-2 and Fvb5-3, with FaF3′H genes FxaC_17g18710 and FxaC_18g31790 located close to the QTL hotspots on LG V-2 and LG V-4 at positions 8.87 and 20.27 Mb, respectively.

Expression of FaMYB1 and FaF3′H genes in F 1 lines contrasting in flavonoid content. To inves-
tigate the involvement of FaMYB1 and FaF3′H in the variation of flavonoids in hotspots on LG V-2 and LG V4, we analyzed the expression of both genes in contrasting F 1 lines. For hotspot on LG V-2, we focused the analysis to the metabolites corresponding to the 10 stable mQTLs (highlighted in yellow in Fig. 4). Among them, two groups of metabolites could be distinguished based in significant correlations: (1) propelargonidin dimer 2 and kaempferol hexose 2 and (2) the rest of metabolites (pelargonidin derivatives, eriodictol hexose 1 and 2 and the benzoic acid derivative) with the exception of kaempferol hexose glucuronide. Strikingly, kaempferol hexose glucuronide was not significantly correlated to any of the two groups. Each of these two groups contained secondary metabolites with high positive correlations among them in the two years, as example propelargonidin dimer 2 and kaempferol hexose 2 had a correlation coefficient of ~ 0.7 in both years ( Fig. 3; Supplementary  Table S2). In addition, the mQTL for metabolites in each group displayed the same direction of allelic effects: a positive allele (increasing the concentration) was inherited from the female and male parent for the (1) and (2) groups, respectively. Together, these data suggest that the variation of each group of metabolites might be controlled by a different gene. Therefore, two groups of contrasting lines, (V-2-1) and (V-2-2), each composed of two pools of six lines, were selected based on the concentration of metabolites belonging to these two groups for differential expression analysis (Supplementary Table S4).
Among the metabolites affected by the major mQTLs in LG V-4 hotspot, epicatechin glucuronide isomer 1 and 2, kaempferol hexose 1, cyanidin hexose and rutin isomer 2 displayed high positive correlations ( Fig. 3; Supplementary Table S2). Except for rutin 2, correlations between these metabolites were higher than 0.9 in both years. The correlation coefficients for rutin 2 with the rest of metabolites ranged from 0.6 to 0.7. In addition, positive alleles for the major mQTLs controlling the variation of these five metabolites were in all cases inherited from both parental lines, suggesting the presence of a common locus in heterozygosis in both parental lines controlling their variation. Therefore, pools of contrasting lines with high and low content of these five flavonoids (V-4) were selected for expression studies. Selected contrasting lines with their relative metabolite content, as well as the genotypes of markers on the mQTL intervals are shown on Supplementary Table S5. Quantitative real time PCR (qRT-PCR) analyses in selected pools were carried out to compare FaMYB1 transcript levels in high and low accumulating lines (Fig. 5a). No significant differences were observed between contrasting lines for any metabolite. In contrast, significant differences in FaF3′H expression were observed between pools contrasting in the content of metabolites in V-2-1 and V-4 (Fig. 5b). Thus, lines with high levels of propelargonidin isomer 2, kaempferol hexose 2 (V-2-2), epicatechin glucuronide isomer 1 and 2, kaempferol hexose 1, cyanidin hexose and rutin isomer 2 (V-4) showed significantly higher expression of FaF3′H, suggesting that this candidate gene might be controlling natural variation of these related flavonoids in strawberry.

Discussion
Total polyphenol content and antioxidant capacity in fruits and vegetables are common parameters for estimation of its health benefits. However, both traits represent aggregated or composite traits gathering the variation of many different antioxidant metabolites, such as polyphenolic compounds and vitamins such as ascorbic acid 35 . In this work, both composite traits, TPC and TEAC, and the majority of phenylpropanoids-related metabolites appear to be under strong environmental control. Thus, all QTL controlling TPC and TEAC and 84.3% of mQTL controlling individual metabolites were only detected during one of the two assessed years. Similar results were recently reported by Labadie et al. 20 in a different F 1 population of octoploid strawberry, in which the authors detected low stability of QTLs for flavonoids in ripe fruits. Due to their function protecting against oxidative stress from several environmental factors (e.g. pathogens, water stress, temperature, and light), the environment plays a major role in inducing antioxidant metabolites 7 . Consequently, it is expected that the genetic effect of composite traits such as TPC and TEAC may not be as strong as the environmental effect along different harvests. The variation observed can also be due to plant-to-plant variability within the three biological replicates, as observed in particular for the TPC and TEAC values, where standard deviation (SD) was in several genotypes, i.e. parental lines, as high as the SD observed among the F 1 progeny (Table 1). Similarly, other studies in strawberry and tomato have reported strong environmental effects over polyphenols and antioxidant capacity 19,20,[36][37][38] . Nevertheless, 15.3% of mQTLs controlling secondary metabolite content were detected in the two assessed years. Furthermore, broad sense heritability for TPC, TEAC and the majority of secondary metabolites in strawberry fruit was moderate, suggesting an important genetic component. This agrees with results in Arabidopsis and tomato where secondary metabolites showed higher heritability than primary metabolites 39,40 . Weak correlation between different flavonoids and antioxidant capacity has been recently reported in an independent study in strawberry fruit 20 . Taken together, all these data indicate that marker-assisted selection (MAS) for increasing TPC and TEAC in strawberry fruit may not be feasible, and instead, breeding strategies for antioxidant content improvement should be directed to specific metabolites with mQTLs controlling large amount of variance and stable in different years.
A great variation on phenylpropanoid derived metabolite content was observed across the population, with the majority of them showing more than tenfold change. Although the variation for primary metabolites across this population was also high 24 , a larger variation was in general observed for secondary metabolites.
Scientific Reports | (2020) 10:20197 | https://doi.org/10.1038/s41598-020-76946-x www.nature.com/scientificreports/ Furthermore, a large proportion of secondary metabolites were not detected (concentrations below the detection limit) in a number of F 1 lines while showed high levels in other lines. Transgressive segregation was generally observed towards lower values. A similar range of variation and both environment and genetic effects have also been observed in the wild diploid Fragaria vesca introgressed with F. bucharica and in a population of the octoploid strawberry 19,20 . HCA and metabolite-metabolite correlations highlighted co-regulation of metabolites based on their biochemical relationships. As expected, strong positive correlations were observed within common metabolic classes, such as proanthocyanidins and flavan-3-ols precursors, or among hydroxycinnamic acid derivatives. Correlation of different or related metabolites in combination with detection of epistatic mQTLs (in QTL hotspots) can help in deciphering which pathway(s) or enzyme(s) is likely involved in affecting their natural variation. Surprisingly, few negative correlations were observed between different classes. Interestingly, some negative correlations were observed between different flavonoids and hydroxycinnamic acid derivatives. Indeed, competition between flavonoid and lignin biosynthesis for a common substrate (coumaroyl-CoA) has been previously reported in strawberry 28,41,42 .
QTLs controlling secondary metabolites were well spread across the strawberry genome. However, in several instances, hotspots of mQTLs were detected, which might suggest the presence of a locus controlling several related metabolites. Interestingly, the majority (28 out of 36) of mQTLs that were stable over the two years and controlled a large proportion (an average R 2 > 20% over the two years) of phenotypic variance were clustered in hotspots on LG IV-3, LG V-2 and V-4, and LG VI-1 and VI-2 (highlighted in yellow in Fig. 4; Supplementary  Table S3). Loci on those regions and markers associated represent useful targets for marker assisted selection of new varieties with increased levels of antioxidant compounds. Detailed information about putative underlying candidate genes on LG IV-2, LG IV-3 and LG I-2 is discussed in Supplementary Discussion.
The largest clusters of mQTLs were detected in LG V-2 and V-4. Detailed analysis of correlations between metabolites and between the QTL allelic effects suggested the presence of at least two linked loci in each hotspot controlling subsets of phenylpropanoid metabolites. Two candidate genes encoding a well-known transcription factor, FaMYB1, and a flavonoid biosynthetic gene, flavonoid-3′-hydroxylase (FaF3′H), were identified within or close by the QTL hotspots. FaMYB1 is a repressor of anthocyanin and flavonol biosynthesis in strawberry 32,33 . Reduction of FaMYB1 expression by RNAi resulted in increased expression of ANR and LAR transcripts and accumulation of flavan-3-ols, which might affect proanthocyanidin accumulation 32 . However, no significant differences in the expression level of FaMYB1 was found between pools of lines contrasting in the content of these flavonoids, dismissing this gene from being the underlying candidate, at least based on gene expression.
We cannot yet formally exclude this TF as candidate gene, and ORF sequence analyses and more detailed gene expression studies may allow to address this hypothesis more comprehensively.
The gene flavonoid-3′-hydroxylase (F3′H) is located very close to mQTL hotspots in LG V-2 and V-4. Expression analysis of F 1 lines contrasting in the levels of propelargonidin dimer 2, cyanidin hexose, epicatechin glucuronide 1 and 2, the flavonols kaempferol hexose 1 and 2, and rutin 2 showed that high expression of FaF3′H is associated with high content of these flavonoids. The activity of F3′H on early compounds determines the 3′-hydroxylation pattern of the B-ring in downstream flavonoid compounds 43 . Therefore, increased levels of dihydroxylated flavonoids (cyanidin, epicatechin and rutin), which variation is largely controlled by the mQTL in LG V-4, is expected in lines with higher FaF3′H expression. However, why monohydroxylated flavonoids such as propelargonidins or kaempferol hexose accumulate to higher levels in lines with higher FaF3′H expression needs further investigation. One possibility could be that FaF3′H had higher substrate specificity for dihydrokaempferol than for kaempferol. Different substrate specificities have been reported for other enzymes in the flavonoid biosynthetic pathway such as DFR 44 or 4CL enzymes 45 .
Cultivated strawberry fruits display a prevalence of monohydroxylated flavonoids (i.e. pelargonidins) in comparison to other berry fruits and to the wild F. vesca, which accumulate a higher amount of dihydroxylated flavonoids such as cyanidins and quercetin 14,43 . Direct proof concerning the mechanisms underlying the partition across these metabolites is still unknown. However, our results suggest that variation in the candidate FaF3′H gene might be the underlying molecular mechanism affecting their natural variation in octoploid strawberry. Interestingly, an eQTL controlling the expression of FaF3′H has been detected in a wider collection of cultivars 46 , further indicating an extensive natural variation in the transcript levels of this gene in ripe fruit.
No association between transcript levels of FaMYB1 or FaF3′H and the variation in several pelargonidin derivatives, eriodictol hexose 1 and 2 and the benzoic acid derivative was detected in our study (V2-2 in Fig. 5). The mQTLs in the hotspot of LG V-2 associated with the variation in these flavonoids controlled a large amount of the observed phenotypic variance (Supplementary Table S3 and S4). Among these metabolites, pelargonidin derivatives are the main anthocyanins that contribute to strawberry fruit color. In agreement, QTLs for color related traits were detected in the same genomic region of LG V-2 in a previous study using the same population 22 . Although no candidate gene has been identified yet in this region, markers linked to these QTLs, such as ChFaM044, M00247-47:C>G and M30873-45:T>G (Supplementary Table S4), here identified represent useful tools for accelerating breeding of fruit color in octoploid strawberry.  22,26 , and variation in primary metabolite traits in ripe fruits has also been characterized for the same years used in this study 24 . About 25 fully mature fruits from all F 1 individuals and parental lines were harvested the same morning in one day at the pick of the harvest of the two consecutive seasons. Per line, all selected fruits were phenotypically similar for shape, size, firmness and colour. Fruit from each line was pooled into three biological replicates, immediately frozen, ground in liquid nitrogen, and stored at − 80 °C until analysis.

Methods
Extraction and quantification of total polyphenol content and antioxidant capacity. For extraction of polyphenols, 0.5 g of frozen powder from fully ripe strawberry fruit was homogenized in 1 ml of acetone:acetic acid (99:1 v/v) solution using a vortex during 2 min, and then centrifuged at 8000 rpm for 15 min at 4 °C. The supernatants were transferred to vials and stored at − 80 °C until used to determine both total polyphenol content and antioxidant capacity. Total polyphenol content (TPC) was determined following the Folin-Ciocalteu method 47 . 10 µl of extract was diluted with 175 µl of Milli-Q water and then 12 µl of Folin-Ciocalteu reagent. After 3 min, 30 μl of 20% sodium carbonate solution was added. Samples were incubated for 1 h at room temperature in the dark and then the absorbance at 760 nm was measured in a UV/Vis microplate spectrophotometer (Thermo Scientific Multiskan GO). Gallic acid was used as standard for the calibration curve, and the results were expressed in milligrams of gallic acid equivalents (GAE) per 100 g fresh weight (mg GAE/100 g FW).
Antioxidant capacity of fruit samples was measured by the ability of antioxidant molecules to quench the ABTS ·+ radical cation [2,2′-azinobis(3-eth-ylbenzothiazoline-6-sulfonate); Sigma-Aldrich] in comparison with the antioxidant activity of standard amounts of Trolox. The Trolox equivalent antioxidant capacity (TEAC) assays were performed as described 48,49 using 2 μl of extract and 250 μL of radical reagent. The absorbance was measured after 5 min at 25 °C using the above-described spectrophotometer. Results were expressed in μmoles of Trolox equivalents per gram of fresh weight (μmol TE/100 g FW). www.nature.com/scientificreports/ Extraction and analysis of secondary metabolites by UPLC-Orbitrap-MS/MS measurements. Secondary metabolites were extracted using 250 mg frozen material. The extraction procedure was performed as follows: 1.0 ml of cold mixture of methyl-tert-butyl-ether:methanol (3:1) was added and the mixture was sonicated at room temperature for 10 min. Later, 0.65 ml of mixture of water:methanol (3:1) to each vial. The vials were then centrifugated at 4 °C and 10,000g for 5 min. A fixed volume (0.6 ml) of polar phase was transferred to a fresh vial before concentrating the extract to dryness in Speed-vac (Centrivac, Heraeus Instrument, Hanau, Germany). Polar secondary metabolites were determined by UPLC-Orbitrap-MS/MS as described in Vallarino et al. 14 . Pre-processing of raw chromatograms was performed using Expressionist Refiner MS 10.0 (GeneData; https ://www.gened ata.com) with an established workflow. Metabolites were putatively annotated by searching the m/z value against the KEGG compound database 50 . The MS/MS fragmentation of the metabolites was compared with putative molecules found in databases and verified with literature on similar compounds reported in strawberry. Integration of the peak area of the corresponding molecular ion was used to quantify the metabolites in the different lines of the population. Data were expressed as the relative content of each metabolite compared to the '1392' parental.
Data analysis. Normality of trait distributions was evaluated by the Kolmogorov-Smirnov test. For most metabolites deviating from normality (P < 0.05), a number of transformations (Ln, square root, inverse of square root, square, cube, reciprocal, and arcsine) were tested and the transformation that gave the least-skewed result was used to perform QTL analysis. Correlation analysis based on Pearson correlation, student's t-test, ANOVA and Tukey HSD test were performed on GraphPad Prism 8 or using R software. Broad sense heritability (H 2 = V G / V P ; V G is the total genetic variance and V P is the total phenotypic variance) was calculated from the variance components obtained by ANOVA as previously described 51 .
QTL analysis. QTL analysis was conducted in the integrated map previously developed for the '232' × '1392' mapping population using MapQTL 5 52,53 . The population was coded under the population type 'cross pollinated' (CP) and QTL analyses performed as reported for primary metabolites 24 . Essentially, the raw relative data from each year were first analyzed by the nonparametric Kruskal-Wallis rank-sum test. A stringent significance level of P ≤ 0.005 was used as a threshold to identify markers linked to QTL. Second, raw or transformed data sets for non-normally distributed traits were used for QTL detection using interval mapping (IM) with a step size of 1 cM and a maximum of five neighboring markers. The significant LOD threshold of QTL was determined using a 1000-permutation test for each trait. QTL with LOD scores at the 95% genome-wide threshold were declared significant. Restricted multiple QTL mapping (rMQM) was then performed using the nearest significant markers as co-factors. Significant mQTL location and 1-and 2-LOD confidence intervals were drawn using MapChart 2.2.
In silico candidate gene search. Physical  RNA extraction and qRT-PCR. The three different groups of contrasting lines each consisted in six lines with high and six lines with low content of selected metabolites. For each line, three biological replicates consisting of about eight ripe fruits each were analyzed. Total RNA was extracted from strawberry fruits as previously reported 24 . Before reverse transcription, RNA was treated with DNase I (Fermentas) to eliminate contaminating genomic DNA. First-strand cDNA synthesis was performed using 750 ng of RNA in a final volume of 20 μl using the iScript cDNA synthesis kit (Bio-Rad), according to the supplier's protocol. Relative quantification of transcripts was analyzed by qRT-PCR using the SsoAdvance Universal SYBR Green Supermix (Bio-Rad) and the comparative cycle threshold method. Expression data were normalized to the reference genes FaCHP1 (FvH4_7g15070) and FaGAPDH2 (FvH4_4g24420). FaMYB1 (FvH4_5g05100) transcripts were quantified using primers Forward: GGC GTG GTC GAT CCA AGA and Reverse: GCA ACC TTC GCC GTG TTT T. Primers used for FaF3′H (Forward: CCG TAG CGT CTC AGT TCT TG; Reverse: ACG AGG TCC TGG TAG TTG TA) are described in 55 . Primers used for FaCHP1 (Forward: TGC ATA TAT CAA GCA ACT TTA CAC TGA; Reverse: ATA GCT GAG ATG GAT CTT CCT GTG A) are described in 56 . Primers used for FaGAPDH2 (Forward: TCC ATC ACT GCC ACC CAG AAG ACT G; Reverse: AGC AGG CAG AAC CTT TCC GACAG 57 .