Major-effect candidate genes identified in cultivated strawberry (Fragaria × ananassa Duch.) for ellagic acid deoxyhexoside and pelargonidin-3-O-malonylglucoside biosynthesis, key polyphenolic compounds

Strawberries are rich in polyphenols which impart health benefits when metabolized by the gut microbiome, including anti-inflammatory, neuroprotective, and antiproliferative effects. In addition, polyphenolic anthocyanins contribute to the attractive color of strawberry fruits. However, the genetic basis of polyphenol biosynthesis has not been extensively studied in strawberry. In this investigation, ripe fruits from three cultivated strawberry populations were characterized for polyphenol content using HPLC-DAD-MSn and genotyped using the iStraw35k array. GWAS and QTL analyses identified genetic loci controlling polyphenol biosynthesis. QTL were identified on four chromosomes for pelargonidin-3-O-malonylglucoside, pelargonidin-3-O-acetylglucoside, cinnamoyl glucose, and ellagic acid deoxyhexoside biosynthesis. Presence/absence of ellagic acid deoxyhexoside and pelargonidin-3-O-malonylglucoside was found to be under the control of major gene loci on LG1X2 and LG6b, respectively, on the F. × ananassa linkage maps. Interrogation of gene predictions in the F. vesca reference genome sequence identified a single candidate gene for ellagic acid deoxyhexoside biosynthesis, while seven malonyltransferase genes were identified as candidates for pelargonidin-3-O-malonylglucoside biosynthesis. Homologous malonyltransferase genes were identified in the F. × ananassa ‘Camarosa’ genome sequence but the candidate for ellagic acid deoxyhexoside biosynthesis was absent from the ‘Camarosa’ sequence. This study demonstrated that polyphenol biosynthesis in strawberry is, in some cases, under simple genetic control, supporting previous observations of the presence or absence of these compounds in strawberry fruits. It has also shed light on the mechanisms controlling polyphenol biosynthesis and enhanced the knowledge of these biosynthesis pathways in strawberry. The above findings will facilitate breeding for strawberries enriched in compounds with beneficial health effects.


Introduction
Commercial production of the cultivated strawberry (Fragaria × ananassa Duch.) has increased steadily in recent years with~12.9 million tons of fruit sold globally in 2017 (http://www.fao.org/faostat/). Increased consumer demand for strawberries is partly due to a greater health consciousness among the consumers and an awareness of the health promoting benefits associated with the consumption of fresh fruits. Strawberries have been shown to contain a wealth of 'health promoting' compounds, many of which have been reported to play a role in reducing risk factors for cardiovascular diseases 1,2 . Strawberries are rich in dietary fiber, vitamin C and a range of secondary plant metabolites, including polyphenol compounds which exert numerous positive health benefits to consumers [3][4][5] . Ingested polyphenols are predominantly utilized in the colon, where the gut microbiota converts isoflavones, ellagitannins, and lignans to equol, urolithins, and enterolignans, respectively, which have anti-inflammatory effects and induce antiproliferative activities in humans 4 . Recent studies have also shown that bioactive metabolites derived from dietary polyphenols by gut microbiota exert neuroprotective effects upon crossing the blood-brain barrier 6 and polyphenols have been evaluated as therapeutics for neurodegenerative diseases 3 .
From the perspective of breeding, polyphenolic compounds have received attention not only because of their beneficial health effects, but also because they contribute to enhanced sensorial properties of the berry experience for the consumer 1,7,8 . Indeed, strawberries are one of the fruits richest in ellagitannins which together with the anthocyanins and proanthocyanidins, represent the highest proportion of their polyphenol content [9][10][11][12] . Polyphenol compounds that accumulate in ripe strawberries include flavonoids, comprising anthocyanins, flavonols and flavan-3-ols, as well as phenolic acids and ellagitannins 10,11 . The anthocyanins, which are responsible for the red color of the berries, consist of four main compounds; pelargonidin-3-O-glucoside, pelargonidin-3-O-malonylglucoside and, to a lesser extent, pelargonidin-3-O-rutinoside, and cyanidin-3-O-glucoside 10,12 . The genetic control of anthocyanin biosynthesis has been extensively studied in the diploid strawberry species F. vesca. Initially, a single genetic locus (c) was shown to be responsible for the yellow-fruited mutants of the species such as 'Yellow Wonder' 13,14 and the gene Flavanone-3-hydroylase (F3H) in the anthocyanin biosynthesis pathway was proposed as a candidate gene for this locus 15 . Subsequently, however, candidate SNPs in the transcription factor FveMYB10 were confirmed to be responsible for the yellow-fruited F. vesca mutants 16 . The main flavonols present in strawberries are glycosides and glucuronides of kaempferol and quercetin. The phenolic acids are predominantly cinnamic acid derivatives, while the most abundant ellagitannin in strawberry is agrimoniin 10 . Although the phenolic composition of ripe strawberries has been shown to vary considerably between cultivars [10][11][12] and was demonstrated to be under strong genetic control in diploid Fragaria species 17 , knowledge of the genetic basis of polyphenol biosynthesis and accumulation in strawberry remains scarce.
In order to breed for higher concentrations of healthrelated phytochemicals such as polyphenols in the cultivated strawberry, it is first essential to understand the inheritance of such compounds. Many quantitative trait loci (QTL) mapping studies have been undertaken to investigate various aspects of cultivated strawberry fruit quality, including the identification of QTL for total anthocyanin content 18 , overall sweetness, titratable acidity and ascorbic acid content 19 , and fruit primary metabolite content 20 . More recently, a study of a 'Delmarvel' × 'Selva' progeny identified many QTL similar to previous studies 21 , and QTL for fruit quality have been identified in 23 related F 1 progenies using a pedigree-based approach 22 .
While numerous reported studies have characterized the polyphenolic content of the ripe fruit of a diverse array of strawberry cultivars 9,11,12,23 , the only investigation to date that has precisely characterized the ripe fruit content of individual polyphenolic compounds in a segregating mapping population was that of Urrutia et al. 17 in diploid Fragaria. In that study, the authors determined the polyphenol content of the fruits of a wild diploid NIL progeny using LC-ESI-MS n and reported 76 stable QTL for the genetic control of 22 distinct polyphenolic compounds including anthocyanins, flavonols, flavan-3-ols, flavanones, hydroxycinnamic acid derivatives, and ellagitannins. However, for the most part, the QTL intervals defined in that study were large, spanning considerable physical distances on the diploid F. vesca genome.
The aim of this investigation was to study the inheritance of genes controlling polyphenol biosynthesis in the cultivated strawberry by characterizing the ripe fruits of three mapping populations from parental lines that had previously been shown to differ in the polyphenol content of their berries 10 using HPLC-DAD-MS n against known standards. The progenies of the three mapping populations were genotyped, and GWAS and QTL analyses were performed. Following the identification of resistance QTL and genetic loci of major effect, the diploid F. vesca and octoploid cultivated strawberry reference genome were mined for candidate genes.

Characterization of polyphenolic compounds in parental and progeny lines
The mean fruit concentrations for six anthocyanins, five cinnamic acids, four ellagic acid derivatives, one ellagitannin (agrimoniin), and five flavonols, are presented in Table 1. For the parents, standard errors are presented together with the marginal means to indicate the precision of the measurements, while for the F 1 hybrids, standard deviations are given to demonstrate the spread of the observations in each progeny 24 . A principal components plot derived from all polyphenolic compounds in the four parental cultivars (Table 1) is given in Fig. 1. Table 1 Concentrations (mg 100 g −1 of FW) of phenolic compounds analyzed in ripe fruit samples in the parental genotypes of the three mapping populations studied, along with the mean concentrations observed in the progeny of each mapping population Polyphenol:

Anthocyanins:
Cyanidin-3-O-glucoside Genome-wide association analysis GWAS were conducted using 24,062 informative markers with their relative positions derived from the F. vesca v4.0 reference genome 25 . Of these SNPs, a total of 4,317 were placed reliably on the F. × ananassa 'Camarosa' genome sequence 26 by Hardigan et al. 27 . The GWAS analyses identified significant marker-trait associations for the polyphenols pelargonidin-3-O-malonylglucoside, pelargonidin-3-O-acetylglucoside, cinnamoyl glucose, and ellagic acid deoxyhexoside in the combined progenies dataset that exceeded the −log10 (p) value significance threshold of 6.5. A total of 163 significant associations were identified with SNP markers for pelargonidin-3-Omalonylglucoside (Fig. 2a), 60 were identified for pelargonidin-3-O-acetylglucoside (Fig. S1), 33 were identified for ellagic acid deoxyhexoside (Fig. 2b), while a single significant association was determined for cinnamoyl glucose (Fig. S1). The MLM model approach confirmed the associations observed in the basal GLM model in all four instances (Fig. S1). All significant loci identified from the 24,062 informative markers with a position on the F. vesca genome were confirmed in the 'Camarosa' reference genome (Fig. S2), however, only in the case of pelargonidin-3-O-acetylglucoside were the most significant SNPs from the informative GWAS marker set mapped to the 'Camarosa' genome. The most significant SNPs at each locus are given in Table 2 along with the physical interval in which all significant markers were located on the F. vesca v4.0 reference genome.

QTL analysis and mapping of traits under single major gene control
Significant QTL were identified corresponding to the genomic intervals containing GWAS associations for pelargonidin-3-O-malonylglucoside, pelargonidin-3-Oacetylglucoside, cinnamoyl glucose, and ellagic acid deoxyhexoside (Fig. 3). A significant QTL was identified on LG1X2 of the 'Saga' × 'Senga Sengana' (S×SS) mapping population for ellagic acid deoxyhexoside; significant QTL were identified for cinnamoyl glucose on linkage groups LG3b in the 'Carisma' × 'Senga Sengana' (C×SS) and S×SS mapping populations and LG6A in the 'Marlate' × 'Senga Sengana' (M×SS) and S×SS populations, and significant QTL were identified for pelargonidin-3-O-malonylglucoside and pelargonidin-3-O-acetylglucoside on LG6b of the C×SS and S×SS populations (Table 3). Axiom marker data   27 enabled linkage groups on the three mapping populations to be assigned to chromosome sequences on the 'Camarosa' cultivated strawberry reference genome sequence of Edger et al. 26 . Between the three mapping populations, 51 markers from linkage group LG1X2 were anchored to 'Camarosa' chromosome Fvb1-3, 59 markers were anchored to chromosome Fvb3-2, 24 markers were anchored to Fvb6-1, and 22 markers were anchored to Fvb6-2. No ambiguities were observed within linkage groups or between homologous groups between mapping populations relating to the chromosomes to which they were anchored on the 'Camarosa' genome sequence.
Following GWAS and QTL analysis, the concentrations of ellagic acid deoxyhexoside observed in ripe fruit of the progeny of the S×SS population, and pelargonidin-3malonylglucoside observed in ripe fruit of the progeny of the C×SS and S×SS mapping populations were scored as a qualitative presence/absence phenotype and linkage mapping confirmed discrete genetic positions of the traits indicating that they were under the control of a single major gene in these populations. Loci controlling ellagic acid deoxyhexoside and pelargonidin-3-O-malonylglucoside biosynthesis co-segregated with markers mapped to genomic intervals on chromosome Fvb1-3 of the 'Camarosa' genome sequence between 6,921,203 bp and 7,554,779 bp (an interval of 633,576 bp) and chromosome The QTL for pelargonidin-3-O-malonylglucoside and pelargonidin-3-O-acetylglucoside were identified in the same region of linkage group LG6b in both the C×SS and S×SS linkage maps (Fig. 3). However, when phenotypic data for pelargonidin-3-O-acetylglucoside were scored qualitatively, the segregation data did not fit the hypothesis of a single major gene controlling the biosynthesis of this compound in either population. A second significant QTL for pelargonidin-3-O-acetylglucoside biosynthesis was identified on LG6X2 in the S×SS mapping population. While marker genotypes were not completely predictive of phenotype, due to the genetic distance between mapped markers and the genetic loci controlling the trait variation, the combination of homozygous genotypes in this progeny of the most significantly associated markers on both LG6X2 and LG6b produced the highest and lowest concentrations of pelargonidin-3-O-acetylglucoside in the ripe fruit of the progeny (Fig. 5). The LG6X2 QTL was not recovered in the C×SS progeny.
Allele-effect box plots of SNP markers co-segregating in individual populations with qualitative phenotypic trait scores for ellagic acid deoxyhexoside and pelargonidin-3-O-malonylglucoside (AX-166511049 and AX-166507798, respectively) are shown in Fig. 6. The allele effects for SNPs co-segregating with pelargonidin-3-O-malonylglucoside and ellagic acid deoxyhexoside were predictive of polyphenol concentrations across all three mapping populations.

Candidate gene identification
Initially, a search for candidate genes was performed within the major gene intervals identified in the F. vesca v4.0 genome sequence where a total of 561 gene predictions were identified in the 3 Mb genomic interval (6-9 Mb) on Fvb1 containing the locus controlling ellagic acid deoxyhexoside biosynthesis, and 408 gene predictions were identified in the 2 Mb genomic interval (34-36 Mb) on Fvb6 containing the locus controlling pelargonidin-3-O-malonylglucoside biosynthesis. Of those on Fvb1, a single gene was identified as a potential candidate for ellagic acid deoxyhexoside biosynthesis, while seven genes were identified as candidates for pelargonidin-3-O-malonylglucoside biosynthesis on Fvb6 (Table 4).
Gene FvH4_1g12660, located in the Fvb1 interval and annotated as a putative UDP-rhamnose: rhamnosyltransferase 1 was identified as the most likely candidate for the gene controlling ellagic acid deoxyhexoside biosynthesis.  Transcript abundance data were not available for the cultivated strawberry species F. × ananassa for the candidate genes identified, however, data for the related diploid species F. vesca, housed on the F. vesca eFP browser 16 (bioinformatics.towson.edu/strawberry) showed that the gene was differentially expressed during fruit development in the F. vesca cultivars 'Ruegen' and 'Yellow Wonder' with transcript levels observed in 'Yellow Wonder' ripe fruit higher than in 'Ruegen' (Table 5).
Following identification and annotation of candidate genes in the F. vesca v4.0 genome sequence, gene predictions within the major gene intervals in the 'Camarosa' genome sequence were annotated, and a total of 361 gene predictions were identified in the 2.5 Mb genomic interval (6.5-9 Mb) on Fvb1-3 containing the locus controlling ellagic acid deoxyhexoside biosynthesis, and 384 gene predictions were identified in the 2.5 Mb genomic interval (31-33.5 Mb) on Fvb6-2 containing the locus controlling pelargonidin-3-O-malonylglucoside biosynthesis.
A search was then made for homologous genes between 'Camarosa' and F. vesca at the major gene loci for ellagic acid deoxyhexoside and pelargonidin-3-O-malonylglucoside biosynthesis identified using the Tripal synteny viewer implemented on the Genome Database for Rosaceae 29 . The syntenic block fafvB1334 identified between chromosomes Fvb1 of the F. vesca v4.0 genome sequence Table 3 QTL identified in the three mapping progeny detailing the mapping population in which they were identified, the linkage group, the marker(s) most significantly associated with the variance observed, the LOD, the percentage variance explained and the genetic and physical positions of the peak LOD of each QTL on the 'Camarosa' genome sequence where an association was found

Discussion
Polyphenol compounds are emerging as potent types of phytochemicals with pleiotropic effects on human health imparted through a dynamic interaction with the gut microbiome. Ingested polyphenols modulate microbiota community composition while microbiota enzymatically transform polyphenols into bioavailable compounds with a range of activities including anti-inflammatory and neuroprotective effects 5 . It is to be expected then, that fruit with increased polyphenol compounds might come into focus of breeding efforts for crops worldwide, including strawberries and other Rosaceous crops. Hence it is timely and of importance to improve our understanding of the genetics basis of polyphenol biosynthesis and accumulation in crops used for human consumption. The genetics of specific polyphenol compound biosynthesis were investigated in cultivated strawberry here for the first time using three mapping populations raised from parental genotypes previously shown to differ in the polyphenol content of their berries 10 . The concentrations of phenolic compounds observed in the parental lines and progenies were in the range previously reported in strawberries 10-12 and significant QTL were identified for four of these polyphenols in the mapping populations studied; pelargonidin-3-O-malonylglucoside, pelargonidin-3-O-acetylglucoside, cinnamoyl glucose, and ellagic acid deoxyhexoside. Moreover, the concentrations of two of the compounds in the fruits of the mapping populations: pelargonidin-3-O-malonylglucoside and ellagic acid deoxyhexoside, were mapped as qualitative traits and were shown to be controlled by a single major gene.

Genetic control of ellagic acid deoxyhexoside production
Deoxyhexoses are a class of six-carbon monosaccharides that have had one or more of their hydroxyl groups replaced with hydrogen atoms and include rhamnose, arabinose, quinovose and fucose. Aaby et al. 9 reported the presence of ellagic acid deoxyhexoside in the ripe fruits of the cultivated strawberry, with subsequent studies also reporting the presence of the compound in strawberry fruit [10][11][12]30 . Far less is known about the biosynthesis of ellagitannins than about the biosynthesis of phenylpropanoids, flavonoids, and anthocyanins. Gallic acid is the basic precursor of ellagitannin biosynthesis, and esterification of gallic acid and uridine-5′-diphosphate glucose (UDP-glucose) by 1-O-acylglucose glucosyltransferases leads to the formation of β-glucogallin 31 , which is converted to 1,2,3,4,6-pentagalloylglucose by 1-O-acylglucose dependent acyltransferases 32 . From here it is suggested that 3,4,5,3′,4′,5′-hexahydroxydiphenoyl moieties are produced through oxidation 33 , and that ellagic acid is then formed by hydrolysis 34 . A recent study in wild and cultivated Fragaria species characterized five 1-O-acylglucose glucosyltransferases 35 , with one gene FaGT2, physically located at 4,152,828 bp on Fvb2 of the F. vesca v4.0 genome sequence, shown to be responsible for β-glucogallin biosynthesis.
To date, other genes producing proteins that catalyze reactions within the ellagitannin biosynthesis pathway have not been characterized, nor have genes responsible for the formation of ellagic acid deoxyhexoside. However, Urrutia et al. 17 reported a major QTL for ellagic acid biosynthesis in diploid Fragaria in the NIL interval Fb1.26-61, which spanned the physical interval between 3,315,998 and 20,747,404 bp on the diploid Fragaria genome, and was therefore not the FaGT2 locus reported by Schulenburg et al. 35 . The 633 kb region mapped in this investigation is within the physical interval of the QTL identified by Urrutia et al. 17 . However, it is unlikely that the two loci are orthologous because there was no correlation between the concentrations of ellagic acid and ellagic acid deoxyhexoside in fruits of the S×SS mapping population. Moreover, the lack of an identifiable QTL for ellagic acid production on LG1X2 in this investigation suggests that they are under independent genetic control.
Since ellagic acid and ellagic acid deoxyhexoside concentrations were not correlated, we hypothesized that the gene underlying the locus identified on LG1X2 in this investigation catalyzed the glycosylation of ellagic acid using a deoxyhexose sugar substrate, leading to the formation of ellagic acid deoxyhexoside. Within the physical interval characterized in this investigation in F. vesca, a single candidate gene with a putative role in ellagic acid deoxyhexoside biosynthesis, FvH4_1g12660, was identified. Gene FvH4_1g12660 was annotated as a putative UDP-rhamnose:rhamnosyltransferase 1, which has previously been shown to be involved in flavonoid modification in Lobelia erinus 36 . Rhamnose is a deoxyhexose sugar that has been shown to be present in the ripe fruits of the cultivated strawberry 37 . Ellagic acid rhamnosides have been identified in the stem bark of Syzygium guineese 38 and more recently in the fruits of Rubus ulmifolius 39 , a close relative of the genus Fragaria. It is therefore plausible that the ellagic acid deoxyhexoside produced in cultivated strawberry is ellagic acid rhamnoside and that its biosynthesis is under the control of the candidate gene FvH4_1g12660. While caution should be exercised when comparing data between related species, the transcript abundance profile of FvH4_1g12660 in F. vesca showed that it was upregulated during fruit development, and that transcript levels in the white-fruited cultivar 'Yellow Wonder' were higher than those observed in the red-fruited cultivar 'Ruegen' (Table 5). This observation was consistent with the findings of Roy et al. 40 who reported a greater accumulation of ellagitannins in white-fruited over the red-fruited F. vesca cultivars. Taken together, the physical location of the FvH4_1g12660 gene, and previously reported transcript profiles for F. vesca suggests gene FvH4_1g12660 as a candidate for ellagic acid deoxyhexoside biosynthesis in cultivated strawberry on LG1X2, particularly as no other genes in the mapping interval were potentially involved in catalyzing glycosylation reactions. However, while fulllength homologs of the FvH4_1g12660 gene were identified within the physical interval on Fvb1 homeologues Fvb1-1 and Fvb1-2 in the 'Camarosa' genome, the Fvb1-3 homolog of the gene was absent from the 'Camarosa' assembly. No reports have been published to date as to whether 'Camarosa' is an ellagic acid deoxyhexoside producer; if it is a non-producer, it is possible that the complete deletion of the gene from the Fvb1-3 chromosome is responsible for the lack of ellagic acid deoxyhexoside production in some cultivated strawberry accessions. Further analyses need to be performed to functionally characterize the gene and its expression in cultivated strawberry, and demonstrate if it has a role in controlling ellagic acid deoxyhexoside biosynthesis, or if another candidate gene in this region is the causal genetic agent.

Genetic control of pelargonidin-3-O-malonylglucoside formation
Anthocyanins are the class of pigments that give ripe strawberry fruits their red color and their concentrations significantly vary between cultivars. The total anthocyanin content in strawberry fruits is predominantly composed of pelargonidin-3-O-glucoside, pelargonidin-3-O-malonylglucoside, cyanidin-3-O-glucoside and pelargonidin-3-O-rutinoside, and the balance between Table 5 Relative transcript abundance of the eight candidate genes identified in this investigation in the F. vesca cultivars 'Ruegen' (red-fruited) and 'Yellow Wonder' (white-fruited) during fruit development these anthocyanins affect the color of the ripe berries 23,41 . Pelargonidin-3-O-malonylglucoside was first identified in strawberry fruits by Tamura et al. 42 , who noted its presence in the Japanese cultivars 'Nyoho' and 'Reiko', but did not detect it in 'Ai-berry' or 'Toyonoka'. Later, Yoshida et al. 23 studied the levels of pelargonidin-3-O-malonylglucoside in relation to fruit color in 20 cultivars of mainly Japanese origin and reported that nine of the cultivars studied were non-producers, while the remaining eleven were producers. In more recent studies, pelargonidin-3-O-malonylglucoside was determined to be the second most abundant anthocyanin in ripe red fruits of 27 10 and 90 12 strawberry cultivars, with concentrations ranging from 0.0 to 20.8 mg 100 g −1 of FW. The results of these studies, demonstrating the presence or absence of the production of pelargonidin-3-O-malonylglucoside between strawberry cultivars, suggests that a mutation in a single major gene is responsible for the absence of the compound in some cultivars. However, to date, the inheritance and genetic control of pelargonidin-3-O-malonylglucoside has not been studied in the cultivated strawberry.
Here, a major QTL for the presence/absence of pelargonidin-3-O-malonylglucoside was observed, and a qualitative interpretation of segregation data from two mapping populations (C×SS and S×SS), demonstrated the presence of a mutation in a single major gene locus determining the lack of biosynthesis of the compound in a 1,768,417 bp interval on Fvb6-2 of the F. vesca genome between 31,229,150 and 32,997,567 bp. Lerceteau-Köhler et al. 18 identified a QTL for total anthocyanin content on LGVIb of the 'Capitola' × 'CF1116' mapping population, and identified the microsatellite marker EMFv010 43 as the transferrable genetic marker most closely associated to the trait. While the precise location of the QTL was not reported in that study, the physical position of EMFv010 on Fvb6 of the F. vesca v4.0 genome sequence is 31,622,577 bp, which is within the genetic interval defined in this investigation. More recently, Urrutia et al. 17 studied the inheritance of numerous polyphenol compounds in an interspecific diploid near isogenic line population, and mapped a minor QTL for pelargonidin-3-O-malonylglucoside explaining 10% of the observed phenotypic variation to a 6.4 Mb genomic interval between 32,907,471 and 39,317,498 bp on Fvb6 of the F. vesca v4.0 genome sequence, which encompasses the interval defined in this investigation. Thus it is highly likely that an orthologous locus controls pelargonidin-3-O-malonylglucoside production in diploid and octoploid Fragaria.
Pelargonidin-3-O-malonylglucoside in cultivated strawberry is synthesized via pelargonidin, which is converted to pelagonidin-3-O-glucoside by the activity of anthocyanidin glucosyltransferase FaGT1. Subsequently, malonylation is achieved by a previously uncharacterized malonyltransferase to give pelargonidin-3-O-malonylglucoside 44 . Our investigation revealed that the production of pelargonidin-3-Omalonylglucoside in cultivated strawberry is the result of the action of a mutation in a single major gene which determines the qualitative presence or absence of the compound, while the concentration of pelargonidin-3-Oglucoside remains relatively unchanged. It has been previously reported that silencing of the anthocyanidin glucosyltransferase GT1 in F. × ananassa leads to a reduction in the levels of both pelargonidin-3-O-malonylglucoside and pelargonidin-3-O-glucoside in strawberry fruits, and that the transcript levels of FaGT1 increase as ripening progresses, with highest transcript abundance in ripe red berries 28 . The FaGT1 locus is located on Fvb7 of the Fragaria genome, and is thus not the locus controlling pelargonidin-3-O-malonylglucoside biosynthesis. Given that the locus on Fvb1 identified in this investigation does not affect pelargonidin-3-O-glucoside biosynthesis, we postulated that the locus we identified on LG6b of the S×SS mapping population was likely to be the malonyltransferase gene catalyzing this final step of the pathway as described above.
The enzyme malonyl-CoA:anthocyanin 5-O-glucoside-6-O-malonyltransferase was first shown to catalyze the malonylation (or aliphatic acylation) of anthocyanins in plants by Suzuki et al. 45 in scarlet sage (Salvia splendens) and was more recently characterized in Arabidopsis (At5MAT 46 ), where it was shown to be the gene responsible for synthesizing malonyl-modified anthocyanins. Dissection of the genetic interval on LG6b revealed a total of seven candidate genes with a putative role in polyphenol production. Each of these candidate genes was annotated as having malonyltransferase activity, with high homology to a predicted malonyl-CoA:anthocyanin 5-Oglucoside-6-O-malonyltransferase. The data of Hawkins et al. 16 from previous expression analyses in cultivars of the diploid strawberry F. vesca showed that three of the seven candidate genes for pelargonidin-3-O-malonylglucoside biosynthesis identified here at the LG6b locus were highly expressed in ripening fruit tissues.
An evaluation of the physical region controlling pelargonidin-3-O-malonylglucoside biosynthesis on chromosome Fvb6-2 on the 'Camarosa' genome revealed two predicted genes with orthology to five of those predicted to have malonyltransferase activity from the F. vesca genome sequence. These three genes were the most likely candidates from the 'Camarosa' physical interval, and thus, we propose one of these three genes as the malonyltransferase that catalyzes the formation of pelargonidin-3-O-malonylglucoside from pelargonidin-3-Oglucoside. Due to the octoploid nature of the 'Camarosa' genome, further genetic characterization of this gene region, and functional characterization of these candidates in producing and non-producing F. × ananassa germplasm is required to demonstrate thoroughly their role in synthesizing malonyl-modified anthocyanins in the ripe fruit of cultivated strawberry.
It is also likely that one of the identified candidate genes at the Fvb6-2 locus was responsible for pelargonidin-3-Oacetylglucoside production in the C×SS and S×SS mapping populations, in tandem with a second genetic factor mapped to LG6X2 in the S×SS mapping progeny. However, since the genetic factor on LG6X2 was associated with a single Axiom marker on the S×SS linkage map, the genetic interval in which it resides was too large to predict likely candidate genes for its control. Further work will be required to functionally validate the role of the candidate genes on LG6b in pelargonidin-3-O-acetylglucoside production, as well as to narrow the genetic interval in which the second genetic factor is located on LG6X2 in order to identify suitable candidate genes at this locus. Pelargonidin-3-O-glucoside was purchased from Polyphenols Laboratories AS (Sandnes, Norway). Formic acid (98-100%) and methanol were obtained from Merck KGAa (Darmstadt, Germany). Acetonitrile was sourced from VWR Chemicals (Fontenay-sous-Bois, France). All solvents were of HPLC or analytical grade, and water was of Milli-Q quality (Millipore Corp., Bedford, MA, USA).

Polyphenolic compound extraction
Fruit samples from the four parental lines and from the progeny of the three mapping populations were partially thawed and homogenized in a food processor (CombiMax 700, Braun GmbH, Kronberg, Germany). Phenolic compounds were extracted from duplicate aliquots (10 g) with methanol (20 ml) by homogenization in a Polytron, PT3100 homogenizer (Kinematica AG, Littau, Switzerland) at 28,000 rpm for 30 s. The extracts were centrifuged at 39,200 × g for 10 min at 20°C (Avanti J-26 XP Centrifuge, Beckman Coulter, USA), following which the supernatants were collected and the insoluble plant material was re-extracted as above with 70% methanol (20 ml). The two pooled supernatants of each sample were combined, and the volume was made up to 50 ml with 70% methanol and stored at −80°C until analyzed.

Polyphenolic compound analysis with HPLC-DAD-MS n
Extracts were filtered through Millex HV 0.45-μm filters (Merck Millipore Ltd., Cork, Ireland), and analyzed using an Agilent 1100 series HPLC system (Agilent Technologies, Waldbronn, Germany) equipped with an auto-sampler cooled to 4°C, a diode array detector, and a MSD XCT ion trap mass spectrometer fitted with an electrospray ionization interface as previously described 10 . Chromatographic separation was performed on a Synergi 4-μm MAX RP C12 column (250 mm × 2.0 mm i.d.) equipped with a 5-μm C12 guard column (4.0 mm × 2.0 mm i.d.), both from Phenomenex (Torrance, CA, USA), with one mobile phase consisting of formic acid/water (2/98, v/v) and a second consisting of acetonitrile. Column temperature was 40°C and injection volume 10 μl. The phenolic compounds were identified based on their UV-vis spectra (220-600 nm), mass spectra and retention times relative to external standards and comparison with previous results 10, 47 . The phenolic compounds were classified based on their characteristic UV-vis spectra and quantified by external standards. Anthocyanins were quantified against pelargonidin-3-O-glucoside (at 520 nm), flavonols against rutin (at 360 nm), ellagic acid and ellagic acid glycosides against ellagic acid (at 360 nm), and the ellagitannin agrimoniin against gallic acid (at 260 nm). Hydroxy cinnamic acids (HCA) (at 320 nm) and cinnamoyl glucose (at 280 nm) were quantified against chlorogenic acid (at 320 nm). The results were expressed as mg per 100 g of fresh weight (mg 100 g −1 FW).

Phenolic compound quantification for genetic analysis
Means and standard errors for the concentrations of all polyphenol compounds were calculated for each of the parental cultivars, while for the F 1 progeny the means and standard deviations were calculated. The computations were done with the R function 'aggregate' 48 . The polyphenol concentrations were scaled using the 'preProsess' function in the R package Caret 49 with the option 'method = scale' which divides each observation by the standard deviation of the analyzed compound. A principal component analysis (PCA) was subsequently performed on the scaled data using the prcomp package from core R. Finally, the scores of the two first dimensions in the PCA were plotted with the ggplot package 50 .

Molecular marker acquisition
DNA was extracted from the parental lines and the progeny of the three mapping populations with the DNeasy Plant Minikit (Qiagen) and quality was determined using a QIAgility spectrophotometer (Qiagen). Samples passing the minimum quality threshold (a 260/280 ratio between 1.8 and 2.0) were normalized to 10 ng μl −1 following quantification using a Qubit fluorometer (Thermo Scientific) against known standards. A total of 140 progenies from the three mapping populations (C × SS (n = 48); M × SS (n = 47); S × SS (n = 45)) and the four parental cultivars were genotyped using the Axiom i35K strawberry array (Ther-moFisher) 51 on a GeneTitan instrument (ThermoFisher). Genotyping was done for each individual using the Axiom Analysis Suite software (ThermoFisher) running default quality and SNP-calling parameters.

Genome-wide association analysis (GWAS)
The segregation data were filtered with snpReady 52 using a minor allele frequency of 0.05 and markers with more than 5% missing data were excluded from further analysis. Missing data were imputed using the 'knni' option in snpReady 53 . All data were combined into a single dataset for subsequent analysis. Since there was no a priori way of assigning all Axiom markers to the subgenomes of the F. × ananassa 'Camarosa' genome sequence 26 for GWAS analysis, the sequences of all informative iStraw35k array markers were used as queries to interrogate the F. vesca v4.0 genome sequence 25 using a local BLAST database running default parameters. The positions determined for each marker were used for GWAS and subsequently, a subset of markers that were assigned sub-genomic locations by Hardigan et al. 27 were used to create a Manhattan plot from 'Camarosa' genome sequence coordinates, and for the identification of genomic intervals for candidate gene identification. A GWAS was conducted for every polyphenol compound analyzed (Table 1) using GAPIT 54 , implemented in R 48 . The kinship matrix 55 was calculated using all the useful markers, and five principal components were included as covariates. The primary model was constructed with the general linear model (GLM) algorithm 56 using a minor allele frequency (MAF) of 0.05. An alternative model was developed using the mixed linear model (MLM) 57 algorithm with the same covariates as with the GLM. Manhattan plots of the −log 10 (p) values were created with the R package qqman 58 using default settings including the 'suggestiveline' (−log 10 (1e−5)) and 'genomewideline' (−log 10 (5e−8)) arguments.
Linkage map construction, QTL analysis, and mapping of qualitative trait loci Data for each mapping population were considered separately for linkage map construction. The SNP data from the parental genotypes of each progeny were scrutinized initially, and those SNPs that were heterozygous in at least one parent were retained. The remaining monomorphic SNPs, and those for which data from either parent were missing were discarded. The SNP segregation data were retained for further analysis if the progeny contained only genotypes predicted from the parental genotype combination, and for which there were 8% or fewer missing values. Linkage maps were constructed separately for each mapping progeny from all informative markers using JOINMAP 4.1 (Kyazma, NL). Marker placement was determined using regression mapping with a minimum logarithm of odds (LOD) score threshold of 3.0, a recombination fraction threshold of 0.35, a ripple value of 1.0, a jump threshold of 3.0 and a triplet threshold of 5.0. Mapping distances were calculated using the Kosambi mapping function, and homeologous subgenome determination (A, b, X1, and X2) for each linkage group followed the nomenclature of Sargent et al. 59 . The marker data of Hardigan et al. 27 were used to determine the 'Camarosa' chromosome homeologues for each of the identified linkage groups and subsequent BLAST analysis of specific chromosome homeologues was performed to assign a physical position to all mapped markers not placed on the 'Camarosa' genome sequence by Hardigan et al. 27 .
QTL analyses were performed separately for the three mapping progenies using interval mapping implemented in MAPQTL 6.0 (Kyazma, NL) with a step size of 1.0 cM, and the percentage phenotypic variance explained and associated LOD values were calculated. The linkage maps and associated LOD plots presented were plotted with Map-Chart 2.1 using the chart function. Following identification of significant major QTL, some traits were coded as qualitative traits and mapped as discrete genetic loci using JOINMAP 4.1 following the procedure described above. Allele-effect box plots of markers co-segregating in individual populations with qualitative phenotypic trait scores were plotted using the quantitative phenotypes for individuals in all three mapping populations and the genotypes of the most closely associated markers mapped.