Publisher Correction: Diversity and function of terpene synthases in the production of carrot aroma and flavor compounds

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

reactions between IDP and DMADP lead to the formation of cis-or trans-prenyl diphosphates that include geranyl diphosphate (GDP, C 10 ), neryl diphosphate (NDP, C 10 ), (Z,Z)-farnesyl diphosphate (FDP, C 15 ), and geranylgeranyl diphosphate (GGDP, C 20 ) in plastids, and (E,E)-FDP in the cytosol. The prenyl diphosphates are then further converted in these compartments by enzymes of the terpene synthase (TPS) family into structurally diverse volatile monoterpenes and sesquiterpenes or semi-volatile and non-volatile diterpenes (C 20 ). The TPS superfamily is divided into seven sub-families 11 with TPSs from angiosperms residing in families a, b, c, e/f, and g. TPS-a and TPS-b subfamilies include primarily sesqui-TPSs and mono-TPSs, respectively, while di-TPSs are found in the c and e/f clades. Mono-TPSs, sesqui-TPSs, and di-TPSs representing the g subfamily typically make linear terpenes and lack the highly conserved RRX 8 W motif characteristic of mono-TPSs of the TPS-b clade. TPS genes often undergo species specific divergence and duplications resulting in terpene metabolic plasticity and adaptations 12 . While the structural diversity and biosynthetic evolution of terpenes have been studied extensively in a variety of crops (e.g. maize, tomato, strawberry, peppermint) [13][14][15][16] , only two TPS genes from carrot, DcTPS01 and DcTPS02 17 , have been functionally analyzed to date, leaving a majority of the biosynthetic genes responsible for the biosynthesis of carrot terpene volatiles uncharacterized. Recently, the genome of the orange, doubled-haploid, Nantes-type carrot DH1 has been sequenced 18 . Genomic and transcriptomic analyses of this genotype estimated a family of 36 potentially functional TPS genes 18 . However, the latest analysis of the carrot TPS gene family predicted 65 full-length TPSs 19 . In conjunction with this study, several QTLs associated with TPS genes were predicted to correlate with distinct terpene compounds. To investigate these loci in more detail and determine the major enzymes contributing to carrot aroma and flavor, we performed biochemical characterizations of 19 carrot TPS genes based on their expression profiles in different tissues of DH1 (leaves, petioles and roots) and roots of field-grown colored carrot varieties (Red, Orange, Yellow and Purple). Employing random forest analysis, we determined distinct terpene representatives of each cultivar and predicted the TPS genes responsible for their biosynthesis based on cultivar-specific transcriptome profiles. As terpene content strongly affects carrot flavor and aroma 20 , results from this study can be applied to enhance carrot palatability and overall carrot quality.

Experimental Results
Analysis of terpene volatiles in DH1 carrot leaves, petioles and roots. Volatile terpenes were extracted from leaves, petioles, and roots of the doubled-haploid carrot DH1 and qualitatively and quantitatively analyzed using GC-MS and GC-FID, respectively. We found that the tissues contained a diverse blend of terpene compounds including 18 major monoterpenes and sesquiterpenes (Fig. 1). Leaf tissues contained high levels of the monoterpenes α-pinene, β-myrcene and (E)-β-ocimene, and the sesquiterpenes δ-elemene, (E)β-caryophyllene and germacrene D ( Fig. 1; Supplementary Table S1). Comparable profiles were obtained from petioles with the exception of lower levels of β-myrcene and germacrene D ( Fig. 1; Supplementary Table S1). Root tissues showed reduced levels of α-pinene, and increased levels of γ-terpinene and α-terpinolene compared to above ground tissues ( Fig. 1; Supplementary Table S1). Other putative sesquiterpene volatiles were not reported due to low levels of abundance and lack of authentic standards or oils for verification.
Identification of TPS Gene Models in the Carrot Genome. The carrot reference genome (Phytozome v12, Daucus carota v2.0, DH1), and publically available RNA-seq data sets (SRA SAMN03216637, cv. DH1) were queried for TPS genes using NCBI TBLASTX. We identified 52 putative TPS gene models including the 36 TPS genes previously predicted from DH1 by Iorizzo, et al. 18 . Although Iorizzo, et al. 18 previously generated a TPS nomenclature based on chromosomal positioning, we adopted the most recent TPS naming system for D. carota proposed by Keilwagen,et al. 19 . Comparisons of the 52 TPS gene models against the reference genome revealed 43 unique full-length open reading frames (Table 1). Several TPS genes are located in biochemical gene clusters on chromosomes 1, 3, 4, 5, 7 and 8, including a dense five gene cluster on chromosome 4 ( Table 1, Supplementary  Fig. S1). Additional TPS gene models predicted by Keilwagen, et al. 19 in a genome-wide association study (GWAS) were not pursued further due to low transcript levels in roots, inability to amplify a full-length transcript, or identity with previously annotated TPSs (Supplementary Figs. S2 and S3).
Gene candidate selection. Gene candidates for biochemical characterization were first screened by tissue specific RNA-seq analysis of DH1 and root specific RNA-seq analysis of colored carrots (Supplementary Figs. S2 and S3). TPS gene candidates with high in silico transcript levels were further selected based on the ability to obtain full-length transcripts and real time qRT-PCR amplicons across multiple tissues ( Fig. 3; Supplementary  Fig. S10). Full-length cDNAs or cDNAs with truncated plastidial transit peptides (19 in total) were constructed for all root-expressed TPS genes (DcTPS03, DcTPS10, DcTPS11, DcTPS14, DcTPS15, DcTPS25, DcTPS26, DcTPS28 and DcTPS30), genes with high expression in above ground tissues (DcTPS04, DcTPS07, DcTPS19, DcTPS23, DcTPS42, DcTPS48, DcTPS53) and any additional TPS genes associated with QTLs (DcTPS27, DcTPS54 and DcTPS55) identified by Keilwagen, et al. 19 . In vitro TPS assays with the recombinant partially purified TPS proteins were performed using common TPS substrates (GDP, NDP, (E,E)-FDP, (Z,Z)-FDP and GGDP) and terpene products were analyzed by headspace SPME-GC-MS.

Characterization of TPS-a Clade
Genes. In addition to DcTPS01, which was previously reported as an (E)-β-caryophyllene synthase 17 , five full-length cDNAs were isolated for TPS-a type genes DcTPS07, DcTPS11, DcTPS15, DcTPS42, and DcTPS53 based on expression profiling as described above. DcTPS11 was found to be most highly expressed in aboveground tissues including young leaves, matures leaves and petioles (Fig. 3). The recombinant DcTPS11 protein converted (E,E)-FDP into germacrene D as one of its major enzymatic products (Fig. 4). Similarly, DcTPS07, which showed highest transcript abundance in petioles (Fig. 3), encodes a protein that exclusively formed germacrene D from (E,E)-FDP (Fig. 4). As germacrene D is a major component of the carrot essential oil in aboveground tissues, it is likely that both DcTPS11 and DcTPS07 contribute to the formation of this compound in vivo. Another member of the TPS-a subfamily, DcTPS53, was expressed in mature leaves and the petiole and its recombinant protein was found to convert (E,E)-FDP to δ-elemene as a major product and constituent of DH1 leaf terpenes (Figs. 3 and 4). The recombinant protein of the root-expressed gene DcTPS15 had limited activity with all tested substrates ( Fig. 4; Supplementary Fig. S11). Enzyme assays with recombinant DcTPS42 demonstrated that the enzyme produced several putative sesquiterpene products from (E,E)-FDP including germacrene D (Fig. 4). Additional members of the TPS-a clade were not tested based on previous characterization (DcTPS01 17 ), low levels of constitutive expression, or inability to amplify a full-length transcript (DcTPS13 and DcTPS38).
DcTPS03, predicted to encode a root expressed mono-TPS based on transcriptome analysis, was found to be expressed at low levels in all tested tissues (Fig. 3). The truncated recombinant DcTPS03 protein converted GDP into α-terpinolene, which is a dominant component of carrot root essential oil (Fig. 1). In addition, DcTPS03 produced the monoterpenes α-phellandrene and limonene from GDP (and NDP) ( Fig. 4; Supplementary Fig. S11).
Five genes in the TPS-b clade (DcTPS04, DcTPS26, DcTPS27, DcTPS54 and DcTPS55) were previously reported to reside in a dense TPS gene cluster on chromosome 4 and correlate with a QTL for sabinene and terpinen-4-ol production in roots (Table 1, Supplementary Fig. S1) 19 . DcTPS04 and DcTPS26 share ~88% sequence identity with a major difference attributed to the presence of a putative 44 amino acid plastidial transit peptide in DcTPS04 ( Supplementary Fig. S5). Truncated DcTPS04 and full-length DcTPS26 produced similar volatile profiles with sabinene, limonene, β-myrcene, α-pinene, and α-terpineol from GDP (and NDP) (   19 Circles indicate bootstrap support of >80% where bootstrap replicates = 500. The tree was rooted with the gymnosperm ent-CPP synthase from Picea sitchensis (PsCPS). Clades representing different TPS subfamilies are indicated by different colors in the perimeter of the tree. TPSs that were functionally characterized in this study are marked with colored boxes and their primary enzymatic products are shown on the outside of the perimeter. TPSs marked in orange are primarily expressed in root tissue while TPSs marked in green are predominantly expressed in the petiole or leaves. TPSs expressed in above-and belowground tissues are marked in light green boxes with an orange shape outline. Functionally characterized TPSs without tissue-specific expression data are marked in grey.
In vitro enzyme assays with a truncated DcTPS30 protein led to the conversion of GDP (and NDP) into γ -terpinene as the major product ( Fig. 4; Supplementary Fig. S11). Because of the predominant expression of the DcTPS30 gene in DH1 roots it is likely that this gene is responsible for the accumulation of high levels of γ -terpinene in this tissue ( Figs. 1 and 3). Expression of the gene DcTPS48 was only detected in aboveground tissues and transcripts were highly enriched in mature leaves and petioles (Fig. 3). The partially purified DcTPS48 enzyme converted GDP (and NDP) into linalool, which could only be found at low levels in mature leaves ( Fig. 4; Supplementary Fig. S11). The recombinant proteins of DcTPS10 and DcTPS14, although expressed in above and/ or root tissues, did show only limited or no activity with any tested substrates (Figs. 3 and 4; Supplementary  Fig. S11). Other members of the TPS-b clade were not tested based on previous characterization (DcTPS02) 17 , low levels of constitutive expression, or inability to amplify a full-length transcript (Supplementary Fig. S2; DcTPS05, DcTPS09, DcTPS12, DcTPS16, DcTPS17, DcTPS21, DcTPS32, DcTPS33, DcTPS47, DcTPS52, and DcTPS62).
Several of the characterized recombinant TPS-b type proteins also converted C 15 and C 20 prenyl diphosphate substrates under in vitro conditions; however, the contribution of these reactions to sesquiterpene and diterpene formation in planta remains unclear based on the plastidial localization of the proteins, limited substrate availability, or absence of the enzymatic product in planta. Recombinant DcTPS03 and DcTPS48 showed limited sesquiterpene production with (E,E)-FDP but made several bisabolene isomers from (Z,Z)-FDP ( Supplementary  Fig. S11). DcTPS04 and DcTPS26 produced several sesquiterpenes from (E,E)-FDP (and (Z,Z)-FDP) including α-bergamotenes (DcTPS04) and β-bisabolene (DcTPS26) ( Fig. 4; Supplementary Fig. S11). In addition, DcTPS26 did convert GGDP into an unidentified diterpene hydrocarbon product ( Supplementary Fig. S14). www.nature.com/scientificreports www.nature.com/scientificreports/ DcTPS19 and DcTPS23 are Members of the TPS-g Subfamily. Based on sequence similarity to characterized genes in the TPS-g subfamily 22 , and the presence of a putative plastidial transit peptide, we predicted the recombinant protein of gene DcTPS19 to function as a mono-TPS ( Supplementary Fig. S6). DcTPS19 was found to be expressed at low levels in all tested tissues except young leaves (Fig. 3). The DcTPS19 protein converted GDP (and NDP) into linalool but also accepted (E,E)-FDP (and (Z,Z)-FDP) as substrates to make nerolidol ( Fig. 4; Supplementary Fig. S11). Linalool could only be detected at low levels in leaves and may be further modified in vivo to non-volatile derivatives, e.g. by glycosylation. Another gene in the TPS-g family, DcTPS23, showed low expression in all tissues with highest transcript levels in petioles and roots (Fig. 3). Enzymatic activity of the recombinant DcTPS23 protein was limited with all substrates (Fig. 4; Supplementary Fig. S11). The remaining genes in the TPS-g subfamily (DcTPS45, DcTPS46, and DcTPS60) were not characterized based on low levels of expression in roots or inability to amplify full-length cDNAs. γ-domain characteristic of diterpene synthases involved in primary and secondary metabolism. In carrot, we identified three TPS genes in the TPS-c subfamily, of which DcTPS25 was expressed in above and belowground tissues in contrast to low expression of genes DcTPS57 and DcTPS59 ( Fig. 3; Supplementary Fig. S10). The recombinant DcTPS25 protein was found to function as a class II diterpene cyclase converting GGDP into ent-copalyl diphosphate (CDP) based on mass spectral comparison of the acid hydrolyzed product ent-copalol (Fig. 5a) with ent-copalol derived from the Arabidopsis thaliana copalyl diphosphate synthase. No enzymatic activity was detected with any other substrate tested.
DcTPS28 in an ent-Kaurene Synthase in the TPS-e/f Subfamily. Of the three TPS-e/f type genes identified by RNA-seq analysis (DcTPS28, DcTPS29 and DcTPS56), we focused on DcTPS28 based on its expression in roots (Fig. 3). When the recombinant DcTPS28 was tested for class I diterpene synthase activity with GGDP as substrate, no product was detected. However, when co-expressed with a pGGeC plasmid carrying a GGDPS gene from Abies grandis and a CPS gene from Arabidopsis thaliana 23 , DcTPS28 converted ent-CDP into ent-kaurene (Fig. 5b). ent-Kaurene could also be produced by co-incubating partially purified DcTPS25 and DcTPS28 with GGDP confirming the enzymatic activities of both enzymes (Fig. 5b). Production of ent-kaurene was verified by mass spectral comparison to products from a known ent-kaurene synthase of Bradyrhizobium japonicum.
Diverse colored root cultivars exhibit distinct volatile terpene profiles. Carrot cultivars of different color can be distinguished by distinct sensory qualities. To determine whether these differences correlate with modifications in terpene profiles, we performed a random forest analysis (see Methods for details) of 14 major monoterpene and sesquiterpene compounds in the colored cultivars P7262 (purple), R6637 (red), Y9244A (yellow) and B493B (orange) (Supplementary Fig. S15). This analysis revealed a strong separation of the colored genotypes (Fig. 6). Variable selection, using the R package Boruta, identified nine terpene factors as important in distinguishing the colored varieties (Table S5). We found that orange carrot roots in this study (cv. B493B) accumulated significantly higher levels of (E)-β-caryophyllene (ANOVA; p = 2.95e-05), α-humulene (ANOVA; p = 1.03e-04) and bornyl acetate (ANOVA; p = 4.23e-04) compared to red, purple and yellow cultivars (Fig. 7). In addition, yellow carrots (cv. Y9244A), accumulated high levels of β-bisabolene (ANOVA; p = 2.02e-03) and (E)-γ-bisabolene (ANOVA; p = 7.51e-03) in comparison to the other tested cultivars (Fig. 7). Although α-terpinolene significantly contributed to cultivar differences (Table S5; ANOVA; p = 0.046), no significant pairwise differences were detected among cultivars (Fig. 7). To determine if the observed cultivar specific terpene differences correlated with the expression of particular TPS genes, we analyzed TPS transcript levels from RNA-seq data of all cultivars using the Bioconducter package Limma (Fig. S2). We found that the cultivar-specific transcript profile of DcTPS01 with highest levels in the orange cultivar overlapped with the metabolite profile of (E)-β-caryophyllene and α-humulene supporting the function of DcTPS01 as an (E)-β-caryophyllene in planta. www.nature.com/scientificreports www.nature.com/scientificreports/ In addition, increased α-terpinolene levels in yellow and orange carrots correlated with the transcript profiles of the α-terpinolene synthase DcTPS03. Several TPS genes exhibited highest transcript levels in the yellow cultivar (Fig. S2). Of these genes, DcTPS26 may contribute to the formation of β-bisabolene in yellow rooted carrots since the DcTPS26 protein lacks a plastidial transit peptide and might make β-bisabolene from (E,E)-FDP in the cytosol (Fig. 4). Three other genes (DcTPS03, DcTPS04, DcTPS54) may have similar roles since their corresponding enzymes are targeted to plastids, where they may contribute to synthesizing γ-bisabolenes and β-bisabolene from (Z,Z)-FDP (Fig. S11). Proteins encoded by other TPS genes with highest expression in the yellow cultivar either did not make bisabolenes or have not been functionally characterized (DcTPS10, DcTPS16, DcTPS33, DcTPS42). No additional correlations between TPS genes expression and profiles of other terpenes were found due to multiple enzymes being involved in the formation of several terpenes (e.g. α-pinene, β-pinene, β-farnesene) or unknown biochemical origin of the compound (bornyl acetate).

Discussion
Carrot (Daucus carota L.) has been extensively studied for its commercial and nutritional value, essential oil content, and resistance against pathogens and herbivores 24,25 . Volatile terpene constituents of carrot essential oil were first analyzed 50 years ago 26 , but their genetic determinants have largely remained unidentified. Here we report on the major terpene volatiles of the orange, doubled-haploid carrot DH1, whose genome was recently sequenced 18 , and identify several TPS enzymes involved with the formation of these compounds in the DH1 and other colored carrot genotypes.
Despite the substantial variation of terpene composition in different carrot genotypes, several of the highly abundant terpenes detected in leaves, petioles, and roots of the DH1 genotype occur also at high levels in other cultivars 19 . These compounds include the monoterpenes α-pinene and β-myrcene and the sesquiterpenes (E)-β-caryophyllene and germacrene D in leaves and α-terpinolene and (E)-β-caryophyllene in roots. DH1 leaves and roots also contain high amounts of δ-elemene and γ-terpinene, respectively, which have been identified at  www.nature.com/scientificreports www.nature.com/scientificreports/ various levels in other cultivars 19,27 . By contrast, bornyl acetate, a typical terpene extracted from carrot roots, was only observed in trace amounts in DH1 root tissue. Compound profiles in the petiole were similar to those in leaves but proportionally fewer terpenes were detected in this tissue. Except for (E)-β-caryophyllene, which is the most predominant volatile in both leaves and roots, DH1 above and belowground tissues maintain distinct terpene profiles 19 . These tissue specific blends differ largely at a quantitative rather than qualitative scale, which suggests possible movement of compounds throughout the plant. As interconnected phloem oil ducts occur in carrot roots, petioles and leaves 6 , it is conceivable that terpenes are mobilized to some extent from roots to shoots or vice versa through schizogenous spaces. The presence of oil ducts in the phloem would suggest that terpene compounds reside mostly in this tissue; however, we did not observe major differences in terpene content between root phloem and xylem under our preparation conditions.
Our initial search of TPS gene models in the DH1 reference genome and publicly available transcriptomes, yielded 43 unique full-length genes. The TPS genes reside on all chromosomes and frequently occur in gene clusters indicating multiple gene duplication events 19 (Table 1, Supplementary Fig. S1). Genes encoding putative cytochromes P450 are associated with some of these clusters (Supplementary Fig. S1) suggesting possible oxidations of terpene olefins although major immediate oxidation products are typically not detected in extracts of carrot tissues. Notably, the type-b clade in the carrot TPS gene family has undergone a substantial expansion in comparison to TPS families of other dicots 28 (Fig. 2) suggesting a selection for monoterpene biosynthetic genes in domesticated carrot. By contrast, the carrot TPS genome contains few di-TPS genes in the TPS-c and e/f clades, two of which (DcTPS25, DcTPS28) could be associated with the formation of CPP and kaurene required for gibberellin biosynthesis. These genes were among 19 out of the 43 genes, which we selected for biochemical characterization based on transcript abundance and the ability to obtain full length cDNAs. qRT-PCR and RNA-seq derived transcript profiles were generally in agreement for root-expressed TPS genes but showed more tissue-specific variation in aboveground tissues. Recently, Keilwagen, et al. 19 identified 22 additional full length TPS genes in the DH1 genome, most of which we did not pursue because of their low transcript levels in roots or inability to amplify full-length transcripts.
The enzymatic products of many of the characterized mono-TPS and sesqui-TPS proteins are present in leaf or root tissues indicating that these enzymes contribute to the detected terpene mixtures depending on their expression profiles and subcellular localization. Several of the recombinant proteins did also convert the cis-isoprenyl diphosphates NDP and (Z,Z)-FDP in vitro. It is unclear whether these diphosphate intermediates are synthesized in carrot tissues and serve as enzymatic substrates in planta. However, a search of the DH1 genome for isoprenyl diphosphate synthases identified two genes that cluster with cis-isoprenyl diphosphate synthases from Solanum lycopersicum and, therefore, may encode enzymes with similar activity (Supplementary Fig. S16).
Besides the previously characterized (E)-β-caryophyllene synthase DcTPS01, we identified TPS enzymes that are most likely responsible or contribute to the formation of five predominant monoterpenes and sesquiterpenes in leaves and roots of DH1 and presumably other genotypes: DcTPS03 produces mostly α-terpinolene and DcTPS30 makes γ-terpinene as its major product. Both γ-terpinene and α-terpinolene have been associated with a sweet, fruity, and citrus like odor 7,29,30 and add to a terpene flavor and burning aftertaste 5 . A correlation of DcTPS03 with the formation of α-terpinolene was also supported from analysis of colored cultivars (see below). However, since DcTPS03 was expressed at fairly low levels in DH1 based on our qRT-PCR results, another uncharacterized root-expressed TPS might contribute to the formation of α-terpinolene in this genotype.
Keilwagen, et al. 19 identified a QTL on chromosome 4 for the monoterpene sabinene and its conversion product terpinen-4-ol, which they associated with a cluster of five closely related genes in the TPS-b clade. Sabinene has been characterized as a compound involved with carrot top aroma 7 . We indeed found four of the genes on this QTL (DcTPS04, DcTPS26, DcTPS54 and DcTPS55), to encode proteins that catalyze the formation of sabinene among other monoterpenes. Three of them are likely to exhibit this activity in vivo because of their targeting to plastids. Other QTLs predicted for γ-terpinene (DcTPS29) and bornyl acetate (DcTPS03) could not be confirmed either because of low transcript abundance (DcTPS29) or different catalytic activity (DcTPS03 makes mostly α-terpinolene) suggesting further refinement of QTL associations will be required. DcTPS04, DcTPS26, DcTPS54 and DcTPS55 may also contribute to the synthesis of α-pinene and β-myrcene in above-and belowground tissues. α-pinene and β-myrcene have been described with a pinene, carrot top odor and a green, terpene like odor, respectively 7 . Among the sesquiterpene synthases DcTPS07, DcTPS11 and DcTPS42 were found to produce germacrene D and DcTPS53 catalyzes the formation of δ-elemene; both compounds are major constituents of leaf volatile terpenes.
We further tested whether terpenes distinctive of selected colored cultivars could be associated with particular TPS genes. Random forest analysis identified several terpene factors with significant differences in four colored cultivars, which likely contribute to the variation in sensory attributes (Figs. 6 and 7). Correlation of TPS gene expression and terpene metabolite profiles of the cultivars supported the function of TPS01 as (E)-β-caryophyllene synthase with highest expression in the orange cultivar, TPS03 as α-terpinolene synthase mostly active in the orange and yellow genotypes, and possible roles of DcTPS26, DcTPS04, DcTPS54, and DcTPS03 in contributing to β-bisabolene and γ-bisabolene formation, respectively, in yellow carrots (Fig. 7,  Supplementary Fig. S2).
Taken together, we have identified genes in the large carrot TPS family that are likely responsible for the formation of predominant terpene compounds in above and belowground tissues including several aroma and flavor constituents in roots. Results from this study may be directly applied in future breeding efforts to improve the sensory quality of carrots.

Material and Methods
plant growth and conditions. Seeds from the doubled haploid orange Nantes type carrot DH1 were kindly provided by Rijk Zwaan and directly seeded into 6-inch clay pots filled with 50% potting mix and 50% composted soil. Seedlings were grown at the University of Wisconsin, Madison, Walnut Street Greenhouse under a 12-h photoperiod with an average temperature cycle of 20-25 °C (night/day). Colored carrot cultivars (yellow-Y9244A, orange-B493B, red-R6637 and purple-7262) were field grown at the University of Wisconsin, Madison. Whole plants were harvested 100 days after planting and frozen immediately in liquid nitrogen for later isolation of RNA and metabolite extraction. Three individual plants from each cultivar were used for extraction.
Identification of TPS genes in the carrot genome. Publically available RNA-seq data for above-and belowground tissues were retrieved from the NCBI Short Read Archive 18 , biosample SAMN03216637, and quality assessed using FastQC. Reads were truncated by nine bp using Trimmomatic 31 to remove low quality sequences and assembled de novo using Trinity 32 . Assembled transcriptomes (20 total) were individually queried with a representative TPS sequence (DcTPS01) using TBLASTX. The resulting "hits" were manually curated for putative functionality based on length and presence of aspartate rich conserved motifs (DDxxD, DxDD). Gene models were refined further by comparing transcripts to genome sequences available in Phytozome (Daucus carota v2.0). Exon/intron structure was predicted by alignment of coding sequences to genomic sequences using the Gene Structure Display Server 33 . Putative N-terminal plastidial transit peptides were predicted from multiple sequence alignments and by analysis of each sequence using the transit peptide prediction software ChloroP 34 . Phylogenetic analysis was conducted in Geneious (v8.0.2) using default settings (bootstrap = 1000) based on multiple sequence alignments generated with Clustal Omega 35 .

TPS Gene expression analysis in DH1 tissues. Initial RT-PCR Analysis of 43 Carrot TPS Genes. Total
RNA was extracted from each DH1 tissue type in biological triplicate (young leaves, fully expanded leaves, petiole, root xylem, root phloem and whole root) using the TRIzol Plus RNA Purification Kit (Life Technologies, Carlsbad, CA) in accordance with the manufacturer's protocol. RNA was treated for DNA contamination with the TurboDNA-free kit (Life Technologies, Carlsbad, CA) and used for first strand cDNA synthesis with SuperScriptII reverse transcriptase and oligo(dT) 18 primers (Invitrogen) according to the manufacturer's instructions. PCR amplification of 43 TPS genes was performed with each cDNA, gene specific primers (Supplementary Table S3), and Taq DNA Polymerase (Promega) with an initial denaturing step of 95 °C for 5 min, followed by denaturation for 30 s at 95 °C, annealing for 30 s at 50 °C, extension for 1 min at 72 °C and a final extension for 7 min for 30 cycles. Actin and PP2A were used as internal controls.
qRT-PCR analysis of transcript abundance of 43 TPS Genes. Total RNA extraction and first-strand cDNA synthesis were performed as described above, however RNA was first normalized between samples and replicates to 2.5 µg based on denaturing gel electrophoresis and spectrophotometer measurements at 260 nm. The resulting cDNA was diluted to 100 ng/µl. Reactions were performed with 1 µl cDNA in a 20 µl reaction using Power SYBR Green PCR master mix (Applied Biosystems) and gene specific primers (Supplementary Table S3). PCR amplifications were done with a CFX96 Touch real-time PCR detection system (Bio-Rad) with the following cycles: 95 °C for 10 min, followed by 40 cycles of 95 °C for 15 s, 50 °C 30 s and 60 °C for 1 min. Melt curve analysis was performed at the end of amplification to ensure specificity of each primer pair. Relative expression levels across tissues for each TPS gene were calculated using the relative quantification method and normalized to actin 36 .

RNA-seq Analysis of TPS Gene Expression in Colored Carrots.
Total RNA was extracted from 14 week old whole roots, of colored carrot cultivars (B493B, R6637, Y9244A, P7262), with three roots (i.e. three biological replicates) per sample set. Total RNA was extracted using the TRIzol Plus RNA Purification Kit (Life Technologies, Carlsbad, CA) following the manufacturer's protocol. DNA was removed with the 'DNA free-kit' provided with the RNA purification kit. RNA quantification was measured on a Nanodrop One Spectrophotometer and quality control was done on an Agilent 2100 Bioanalyzer RNA NanoChip. For each RNA sample, libraries were prepared at the University of Wisconsin-Madison Gene Expression Center and sequenced on an Illumina HiSeq 2000 using 1×100 nt reads. After quality control with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), reads were filtered with Trimmomatic version 0.32 with adapter trimming and using a sliding window of length ≥50 and Phred quality score ≥28 31 . Reads were mapped against the carrot genome sequence (GenBank accession LNRQ01000000.1) using Bowtie2 (Langmead and Salzberg 2012) 37 . Illumina reads were mapped against the carrot genome sequence (GenBank accession LNRQ01000000.1) using Rsubread version 1.24.2 38 . Transcript expression was analyzed using the Bioconductor package limma 39 .

Amplification of full-Length tpS cDnAs and plasmid construction. Full-length cDNAs for
DcTPS07, DcTPS11, DcTPS15, DcTPS19, DcTPS23, DcTPS26, DcTPS42, DcTPS53 and those truncated based on predicted transit peptide coding regions DcTPS03, DcTPS04, DcTPS10, DcTPS14, DcTPS25, DcTPS27, DcTPS28, DcTPS30, DcTPS48, DcTPS54 and DcTPS55 were obtained by PCR-amplification with gene-specific primers carrying restriction sites (Supplementary Table S4). Template cDNA was derived from root and stem RNA as described above. Amplification was performed with Q5 High-Fidelity DNA polymerase in a 25 µl reaction volume with the following PCR conditions: 98 °C for 30 s, followed by 30 cycles of 98 °C for 30 s, 55 °C for 30 s, 72 °C for 1 min 45 s and a final extension at 72 °C for 2 min. Amplified fragments were gel purified using a NucleoSpin Gel and PCR clean-up kit (Macherey-Nagel, MN) and concentrated to ~10 µl. A 10 µl A-tailing reaction was prepared with 3 µl of purified PCR product incubated in the presence of 10 mM dATPs and Taq polymerase at 72 °C for 30 min. The resulting product was ligated overnight into the pGEM-T Easy vector (Promega) and Sanger sequenced to verify the insert. Open reading frames were then digested with the appropriate restriction enzymes www.nature.com/scientificreports www.nature.com/scientificreports/ (typically BamHI, XhoI) and ligated overnight into the corresponding restriction sites of the bacterial expression vector pET28a (Novagen). Constructs were Sanger sequenced again prior to expression in Escherichia coli.

Recombinant protein expression in E. coli and tpS assays. Plasmids were transformed into E.
coli BL21-CodonPLus(DE3) cells (Stratagene), and individual colonies were selected for inoculation into 5 ml Luria-Bertani (LB) media supplemented with 50 µM kanamycin and grown at 37 °C/220 rpm overnight. The following day, 1 ml of the saturated overnight culture was transferred to 100 ml LB media supplemented with 50 µM kanamycin and grown at 37 °C/220 rpm until the optical density reached 0.5-0.7. After cooling cultures at ~25 °C for 30 min, 0.5 mM isopropyl 1-thio-ß-D-galactopyranoside (IPTG) was added to induce protein production and cultures were incubated at 18 °C/220 rpm for 16 h. Cell pellets were washed with 10 mM Tris base and 50 mM potassium chloride, resuspended in 4 ml phosphate buffered saline (PBS, 50 mM sodium phosphate, 100 mM sodium chloride, 10% glycerol) supplemented with 1 mM dithiothreitol (DTT) and 0.5 mM phenylmethylsulfonyl fluoride (PMSF), and lysed by sonication. Clarified extracts were mixed with equal parts PBS and recombinant His(6×)-tagged proteins were partially purified by Ni 2+ affinity chromatography according to the manufacturer's instructions (Qiagen). Partially purified proteins were then desalted on PD-10 desalting columns (GE) equilibrated with assay buffer (10 mM MOPSO, 10% glycerol [v/v] and 1 mM DTT, pH 7.0) and visualized by SDS-PAGE (10%, GenScript). In vitro enzyme assays were prepared by combing Ni-NTA purified protein with 20 mM MgCl 2 and 60 µM commercially available prenyl diphosphate substrates (Echelon Biosciences) in a 125 µl reaction volume in a 10 ml screw cap vial (Supelco). Vials were immediately sealed and incubated for 5 min at 30 °C in the presence of a 100-µM polydimethylsiloxane fiber (Supelco) using automated solid phase microextraction (SPME, AOC-5000 Shimadzu). Incubation was extended to 40 min in assays with DcTPS04, DcTPS53, DcTPS54, and DcTPS55 proteins. Volatile compounds were eluted by thermal desorption for 4 min at 240 °C and separated and analyzed by gas chromatography mass spectrometry (GC-MS-QP2010S, Shimadzu). Eluted compounds were separated on a Zebron capillary column (30 m x 0.25 mm i.d. x 0.25 µm, Phenomenex) in a 5:1 split using Helium as the carrier gas (1.4 ml min −1 flow rate) and a temperature gradient increasing from 40 °C (2 min initial hold following injection) to 220 °C at a rate of 5 °C min −1 . Identification of major volatile compounds was confirmed by comparisons of retention times and mass spectra to authentic standards when available (Sigma), mass spectral libraries (Wiley and NIST) and Opopanax essential oil (Floracopeia).

DcTPS25
Assay. Diterpene cyclase activity of DcTPS25 was tested by incubating partially purified protein as described above with 60 µM GGDP and 10 mM MgCl 2 for 1 h at 30 °C with the addition of a 1 ml hexane overlay. Following incubation, 80 µl of 5 M HCl or water was added and mixed by vortex to facilitate acid hydrolysis of terpene products. Hexane fractions were dried over magnesium sulfate (MgSO 4 ), concentrated to ~40 µl, and 1 µl was injected into the GC-MS as described above. Identification of ent-copalol was confirmed by mass spectral comparisons to acid hydrolysis products from the known CPS from Arabidopsis thaliana 23 .

DcTPS28
Assay. Diterpene synthase activity of TPS28 was tested as described above, either alone or co-incubated with partially purified TPS25, and by co-expression of pET28a-DcTPS28 with a pGGeC plasmid (provided by Dr. Reuben Peters), which carries a GGDPS gene from Abies grandis and a CPS gene from Arabidopsis thaliana 23 . Constructs, including a known ent-kaurene synthase gene from Bradyrhizobium japonicum as a control (pDEST15-BjKS courtesy of Dr. Reuben Peters), were co-transformed into E. coli C41 (DE3) OverExpress cells (Lucigen) and a single bacterial colony was selected to inoculate 5 ml LB media and incubated for 16 h at 37 °C. The saturated culture was used to inoculate a 50 ml TB culture, which was incubated at 37 °C until the OD 600 reached 0.5-0.7. Protein expression was induced by the addition of 0.5 mM IPTG and incubated with shaking at 18 °C for 72 h. Cultures were extracted with equal parts hexane, dried over MgSO 4 , concentrated to ~40 µl, and 1 µl was injected for GC-MS analysis as described above. Identification of ent-kaurene was achieved as described above, and by comparisons to the ent-kaurene product produced by the BjKS.

GC-MS and GC-FID analysis of terpenes from plant tissues.
Volatile terpenes were extracted from 1 g of leaf, petiole, root phloem, root xylem and whole root samples each from three individual plants (DH1) grown under culture conditions described above. Samples were rinsed with deionized water, dried with tissue paper and immediately frozen in liquid nitrogen for processing. Samples were then ground to a fine powder for 2 min in the presence of liquid nitrogen, weighed, transferred to 5 ml hexanes and mixed by vortex for 20 s. The ground material was placed in an ultrasonic bath (Fisher Scientific) for 10 min and then pelleted by centrifugation. Following the collection of two fractions, 1-bromodecane was added for a final concentration of 20 ng/µl as an internal standard and extracts were dried over a MgSO 4 column and concentrated on ice to ~40 µl. Extracts were separated as above with a 10:1 or 40:1 split using the same column and conditions above, and by GC-FID (Thermo Finnigan) using Helium as the carrier gas (1.4 ml min −1 flow rate) and Nitrogen, Hydrogen and Air (25,35, 350 ml min −1 , respectively) as makeup and combustion gasses. Annotation of major terpene compounds was achieved as described above. Chromatograms were compared between GC-MS and GC-FID results for compound identification and quantification using the multipoint internal standard method (Alltech). Standard curves for monoterpene and sesquiterpene compounds were constructed with authentic α-pinene and α-humulene (Sigma), respectively, and obtained values were normalized to gram fresh weight. Analysis of volatile compounds for colored carrot cultivars (B493B, R6637, Y9244A, P7262) followed identical methodology with the exception that compounds were only analyzed in roots by GC-MS and reported as normalized relative abundance ((peak area analyte/peak area internal standard)/gram fresh weight).

Random forest analysis and boruta factor selection.
To assess the importance of major terpene compounds from roots in distinguishing among colored carrot cultivars (see above), relative terpene abundances from each sample (n = 3 per cultivar) were analyzed by random forest classification models, followed by variable (2020) 10:9989 | https://doi.org/10.1038/s41598-020-66866-1 www.nature.com/scientificreports www.nature.com/scientificreports/ selection using the Boruta algorithm in R v3.5.0 40,41 . Random forest is a machine-learning classification method that builds sets of decision trees from bootstrapped subsets of the entire sample set. Each tree classifies a subset of samples according to a random sample of their attributes (in this case different VOCs) and then calculates the classification error according to the remaining unselected samples. By averaging across thousands of iterative trees, this method provides a robust estimation of which compounds are most important in distinguishing among groups 42 . Random forest analysis was set to 5000 bootstrap iterations for compound selection. To further assess the direction and significance of compound differences across cultivars, compounds selected in bootstrapped models were retained for use in a MANOVA model conducted across all selected compounds. Based on significant multivariate differences among cultivars (F = 16.936, p = 6.6e-5), this was followed with ANOVAs and Tukey-HSD post-hoc comparisons of compound levels among cultivars.

Data availability
RNA-seq data sets used for TPS gene identification and expression in D. carota DH1 are available in the NCBI Short Read Archive (SRA) 18 , biosample SAMN03216637. RNA-seq data sets for the colored carrot cultivars are available in the SRA database under BioProject accession number PRJNA594937.