Leaf nutrient content and transcriptomic analyses of endive (Cichorium endivia) stressed by downpour-induced waterlog reveal a gene network regulating kestose and inulin contents

Endive (Cichorium endivia L.), a vegetable consumed as fresh or packaged salads, is mostly cultivated outdoors and known to be sensitive to waterlogging in terms of yield and quality. Phenotypic, metabolic and transcriptomic analyses were used to study variations in curly- (‘Domari’, ‘Myrna’) and smooth-leafed (‘Flester’, ‘Confiance’) cultivars grown in short-term waterlog due to rainfall excess before harvest. After recording loss of head weights in all cultivars (6-35%), which was minimal in ‘Flester’, NMR untargeted profiling revealed variations as influenced by genotype, environment and interactions, and included drop of total carbohydrates (6–50%) and polyols (3–37%), gain of organic acids (2–30%) and phenylpropanoids (98–560%), and cultivar-specific fluctuations of amino acids (−37 to +15%). The analysis of differentially expressed genes showed GO term enrichment consistent with waterlog stress and included the carbohydrate metabolic process. The loss of sucrose, kestose and inulin recurred in all cultivars and the sucrose-inulin route was investigated by covering over 50 genes of sucrose branch and key inulin synthesis (fructosyltransferases) and catabolism (fructan exohydrolases) genes. The lowered expression of a sucrose gene subset together with that of SUCROSE:SUCROSE-1-FRUCTOSYLTRANSFERASE (1-SST) may have accounted for sucrose and kestose contents drop in the leaves of waterlogged plants. Two anti-correlated modules harbouring candidate hub-genes, including 1-SST, were identified by weighted gene correlation network analysis, and proposed to control positively and negatively kestose levels. In silico analysis further pointed at transcription factors of GATA, DOF, WRKY types as putative regulators of 1-SST.


Introduction
Curly-and smooth-leafed endives (Cichorium endivia var. crispum and var. latifolium) are consumed worldwide as fresh or minimally processed salads that are sources of healthy nutrients 1 and good-profit greens in the export of EU major producers such as Spain, France and Italy (TrendEconomy, http://trendeconomy.com). The Italian cultivation of endives (http://dati.istat.it/) occurs mostly in open field (97% of 8426 ha and yield of 1886738 q in 2019) in autumn-winter cycles because it is low input and cold-tolerant crop 2 . Precipitation excess and soil inadequate drainage cause root waterlogging that dramatically affect product yield and quality. Waterlog stress is characterized by lower oxygen and nutrient availability (e.g. leaching of nitrogen fertilizers) together with enhanced plant susceptibility to diseases favoured by congenial conditions for pathogens 3 .
In outdoor cultivation, oxygen deficiency directly affects the root while the shoot remains oxygenated, underground and aerial organs react differently with anatomical and metabolic modifications 4 . Overall, the waterlog impact on plant metabolism varies with severity, timing and duration of stress and with genotype tolerance 5 . Communication from anoxic root to shoot occurs via xylem 6 , throughout root signals such as tricarboxylic acids (TCA) 7 and hormones 8 . Leaf responses include stomatal closure and non-stomatal metabolic alterations (e.g. oxidative stress) that lead to decreased CO 2 incorporation and net photosynthesis drop, while the bleaching is recurrently associated with chlorophyll loss and senescence. Frequently, leaves of sensitive crops undergo stronger alteration in sugar, amino acids and TCA metabolism. On the contrary, tolerant species, which are more efficient in carbohydrate utilization, the show raised levels of fumarate, γ-aminobutirric acid and alanine belonging to routes tailored to compensate anoxia damages 9 .
Inulin has prebiotic and healthy properties, has been used in food and non-food applications, and frequently extracted from the taproot of chicory (C. intybus var. sativum), which has been a model system to study or modify inulin metabolism 10 . Chemically, root chicory inulin is a fructan-type polymer made of fructose subunits linked by ß-2,1 bonds ending with a α-linked glucose (reduced form, lacking the terminal glucose, can also be found). The degree of polymerization (DP) of the subunits varies from 2 to 60, and its partial enzymatic hydrolysis products (DP 2-8) are also named oligofructose or fructooligosaccharides (FOS). Being water soluble, fructans can localize in vacuole, apoplast and xylem, and mainly represent energy storage with extended roles in stress protection 11 . In leaves, inulin occurs at much lower levels and DP than roots; for instance, witloof leaves contain about 14 vs 1300 mg g −1 dry weight and 3-5 vs 18-20 DP compared to roots; relatedly, endive leaves were showed comparable levels with witloof 12 . Finally, inulin content of chicory leaf increases with maturity 13 or by sucrose induction 11 . Given the importance of vegetable intake in human diets, more information on inulin and FOS of Cichorioideae salads is envisaged.
In chicory, inulin is under control of the biosynthetic and sequentially acting enzymes (fructan active enzymes, FAZYs) sucrose:sucrose 1-fructosyltransferase (1-SST) and fructan:fructan 1-fructosyltransferase (1-FFT), and of the depolymerizing 1-fructan exohydrolases (1-FEH). The 1-SST transfers fructose from donor sucrose to second sucrose, which accepts the fructose to produce free glucose and 1-kestotriose. The latter is a fructose donor that allows the 1-FFT elongating the inulin chain and releasing free sucrose. The 1-FEH specifically acts on ß-2,1 bonds, breaking fructans into sucrose and fructose, and three isoforms named I, IIa and IIb occur in chicory 14,15 . The genes of the above-mentioned enzymes were characterised at the genomic, phylogenetic and expression levels, and positive correlation among mRNA abundance, enzyme activity and end-product strongly supported tight control at the transcriptional level 16 .
Few studies have focused on metabolic changes of Cichorioideae salad heads as challenged by waterlog 17,18 . Recently, integrated omics technologies have been helpful to get insights in nutrient, physiological and molecular aspects of endives 19 . In this work, untargeted metabolite profiling by NMR and transcriptomic analyses were used to quantify the changes of thirty compounds in leaves of endives harvested after natural waterlog, address the transcriptomic response variability among smooth-and curly-cultivars, characterise gene networks that regulate inulin contents, and provide indication on endivegenotypic performance in case of precipitation excess.

Results and discussion
Rainfall excess and waterlog affect the productivity of smooth-and curly-leafed endives Two summer-fall productive cycles of endives were carried out at comparable conditions on the same parcel (Table S1) in 2011 (Y1) and 2012 (Y2). Climate analyses of Y2 vs Y1 showed significant differences in rainfall (Fig. 1a) and humidity but not in temperature values (Fig. 1b). During the entire cultivation cycle in Y2 (Table S2), a doubled number of rainy days together with heavy rainfall lead to +233% rain surplus compared to Y1 (up to +245% within 15 days before harvest). Consistently, relative humidity in Y2 was at least 10% higher than Y1 (Fig. 1b). The rainfall excess in Y2 (11-12/11/2012) caused waterlogging (soil oversaturation, 0 kPa) that involved only the root-zone 20 and was recorded to last 72 h until soil draining (field capacity; 10kPa) was restored. The variation of head weights (HW) and the dry vs fresh weight ratio (DW/FW) of leaves were analysed ( Fig. 1c) with respect to the cultivation year (Y) and genotype (G). The Y and GxY effects were significant on HW, while only Y was significant for DW/FW, and no variation was due to G. In Y2, the HW loss was −35%, −14%, −30% and −6% respectively in Domari, Myrna, Confiance and Flester (D, M, C and F); the DW/FW loss was −29% in M and −32% in the others. Weight loss of all cultivars recalled behaviour of crops highly susceptible to waterlog 21 , even though endive harvest occurred in a presumed recovery stage. In indoor experiments, root chicory was shown to compensate intermittent flood stress by increasing leaf number and modifying root shape 22 . Here, it is speculated that the different sensitivity of endives might rely on the root system made of abundant fibrous roots prevailing over a small taproot in endive, while the opposite occurs in industrial chicory. Finally, indoor-induced waterlog caused decrease of lettuce biomass in a cultivardependent manner 17 , while GxY effects were most significant for endives in our conditions.

Untargeted metabolic profiling by NMR
Metabolic phenotyping of D, M, C and F leaves was achieved by NMR assignment of 30 hydrosoluble compounds (Table S3), grouped into carbohydrates, organic acids, amino acids, polyols, phenylpropanoids and other compounds, and quantifications (Table 1) referred to Y1 and Y2. The data set was explored by PCA (Fig. 1d), the PC1 and 2 respectively explained 46.2 and 26.8% of the total variance. Specifically, the PC1 separated the Y1 from Y2 products (positive and negative values of D I , M I , F I and  C I vs D II , M II , F II and C II ), and pointed at total carbohydrates (GLC, KES, INUL, SUC and FRU) and phenols (CHA, MCTA) as the most diverging variable groups. Moreover, the biplot diagonal from bottom left to up right (not shown) depicts the GxY effects due to PC1 and 2 contributions. The metabolic (and transcriptomic) analysis was narrowed to edible product; consequently, it could not directly address variations/responses that had occurred in root, stem and organ interplays.

Carbohydrates
The ANOVA deepened data exploration, and min to max value ranges hereafter refer to mg g −1 dry weight. Synoptically (Table 1), FRU (84.5-148.2) was the most abundant carbohydrate, followed by αand β-GLC (16.2-40.8 and 31.1-77.1, respectively), SUC (9.6-34.6), and by KES and INUL (0.8-8.6 and 0.6-2.6, respectively). Y influenced the content variation of all sugars; the G effect was specific on INUL, KES and SUC, while the GxY acted on all of them except for FRU. The CAR-T loss in Y2 was −20% in D, −33% in C, −44% in M, and −48% in F as compared to Y1 production. Focusing on KES and INUL contents, in Y2 they dropped intensely and differentially according to cultivars (INUL, −38% in D and F to −65% in M and C; KES, over −50% in D and F and over −80% in M and C), and their contents showed a strong positive correlation (Fig. S1). From now on, discussion themes will mainly refer to literature on endive using conversion factors (Table 1 legend reports DW to FW formula) and to leaf metabolism as consequence of waterlog. Two sources 23,24 reported on GLC/FRU/SUC ranges in endives that were consistent with this work. Moreover, inulin ranges were measured in outdoor curly endive 12 and consistent with KES + INUL levels of D and M curly types. Regarding flavour (Table S4), the computed sucrose sweetness equivalency (SSE) was mainly affected by GLC, FRU and SUC variations, given that INUL and KES are only 2-6% of total sugars and have low relative sweetness power (0.10-0.22 and 0.33 vs SUC as unit reference). Y and G×Y (but not G) influenced SSE just as CAR-T contents, and SSE decreased up to −46% in M and F in the rainier Y2. Moreover, no significant association occurred between sugar content/SSE and leaf phenotype. In addition, considering the INUL + KES contents of 0.07-0.86 mg g −1 FW, one hundred gram serve of endive provides much <20 g inulin that is the human well-tolerated daily dose 25 . G effects are known for dietary fibre of endives 26 consistently with this work. Waterlog susceptible species usually show decreased levels of leaf GLC/FRU/SUC oppositely to tolerant ones 27,28 . The carbohydrate drop in endives confirms vulnerability to soil flooding as observed in taproot chicory 22 . However, SUC levels undergo complex variations since accumulation and loss were respectively measured in soy 29 and chicory leaves 22 in response to waterlog. Finally, sugar export from leaves is necessary to sustain the increased glycolysis for ATP production in hypoxic roots 30 , supported by the evidence of concurrent decrease of SUC in leaf and increase in the phloem sap of hypoxic plants 31 . Contextually, we propose that the carbohydrate loss in endives may derive from both reduced synthesis and enhanced export from leaves responding to short-term floods.

Organic acids
MA, CA and TA showed the highest levels (Table 1), and G, Y and GxY effects were significant on SA and TA, (see figure on previous page) Fig. 1 Climate condition, leaf parameters and metabolite explorative analysis. a-b Climate data during cultivation in 2011 and 2012. a Cumulative rainfall and rainy days (left y-axis), and soil humidity (right y-axis) one month before harvest. b Boxplot of relative humidity (RH) and temperatures (T). Significance by Student's t-test: *, ** = significant at P ≤ 0.05, 0.01, respectively; ns, non-significant. c-d Product parameters and principal component analysis of leaf metabolites (PCA). c Histograms of head weights (left panel) and dry vs fresh weight ratios of leaves (right panel). Effects of genotype (G), year (Y) and interaction (G×Y) by two-way ANOVA are below each panel. *, **, *** = significant at P ≤ 0.05, 0.01 and 0.001 respectively; ns non-significant.  but not on FA and LA; CA was affected only by G and MA only by GxY. The total mean levels raised 2-4% in M and C, 22-30% in F and D in Y2. The MA and CA ranges reported for a curly type were comparable to this work, while SA was measured as 10-fold higher 23 (here 0.001-0.003 g 100 g −1 FW). We were unable to retrieve literature data on FA, LA and TA ranges in endives, however, our values (6-150 mg 100 g −1 FW) fell in the ample variations of leaves (0.55-685.28 and 750-3000 mg 100 g −1 FW) of numerous species 1,32 . Shoots of flooded crops can show an increase in fermentation metabolites such a SA 6 and LA 4 , typically accumulated in hypoxic roots, suggesting the TCA and lactate routes may be activated in leaves. However, LA and SA enrichment in sap imported from waterlogged roots 6 cannot exclude channelling into leaves. OA metabolism of leaves responding to flooded roots varies with species and stress duration, e.g., chicory leaves showed MA and CA accumulation 22 contrary to soy and Arabidopsis responses 4,6 .
Regarding endives, the SA increase (85-189%) in all cultivars from the rainier Y2 is consistent with waterlog effects; however, key genes of anaerobic pathways were not triggered in leaves (see the following sections) and other routes must have been involved. Finally, postwaterlogging endives shared TA raised levels (49-120%), which has not been reported to our knowledge so far.

Amino acids
ASN and GLN had the highest contents followed by GLU and ASP, while the other amino acids (AA) were below one mg g −1 DW (Table 1). G and GxY affected the levels of all AA; Y had also strong effects on most AA, though non-significant on ASN, ASP, GLN, GABA. The AA-T level raised by 8-15% in D and C, and decreased by 32-37% in M and F in Y2. Databases on endive (https:// fdc.nal.usda.gov; http://www.fao.org) just report mean values that are ca. 10 fold higher than in this work, whereas consistency was found with surveys addressing season and genotype effects in endive 33 . In roots, the AA metabolism develops in complex manner depending on species and stress conditions and is usually characterised by the decrease of most AA 6 and raise of anoxic responsive ones such as ALA or GABA 4 . The yearly variation of these latter was genotype specific in endive leaves, indeed, D and C showed gain in GABA and ALA (+9 and 32%; +22 and 53%, respectively) differently from M and F (−21 and −28%; −3 and 0% respectively). Finally, most endives showed drop (up to −42%) of NO 3 content in Y2 vs Y1 harvest (Table S5). This is may be due to N depletion in waterlogged soils (e.g. by microbe denitrification, leaching), root damaging, and impairment of nitrate efficiency use as suggested by the altered gene expression in the "nitrate assimilation" gene-ontology term (GO:0042128 in Table S6).

Polyols
MI content (2.4-5.7) exceed CI and SI ones (Table 1), and the total levels dropped in the rainier Y2 (up to −37% in F); all variables were under G but not Y effects, while GxY affected only CI and MI. In literature 24 , endive inositol amounts varied comparably to this work, except for the MI which was over three-fold higher in this work (11.8-44.3 mg 100 g −1 FW) and an added value considering its healthy properties 35 . In Y2, the MI decrease was concurrent with its precursor GLU and with CI and SI increase in all cultivars, suggesting that MI conversion into CI and SI might be a specific stress response 36 . We could not retrieve literature data on QA contents in endive, while chicory leaf ranges 37 were slightly higher than our samples (3.8 ± 0.2 vs 1.1-2.9 mg 100 g −1 FW).

Others
CHN and ETA contents were affected by G, Y and G×Y (Table 1). Their total levels significantly increased from 9 to 52% in all cultivars in Y2. ETA is the CHN precursor; both are essential dietary nutrients 38 and literature ranges (https://www.ars.usda.gov) available for CHN in lettuce (6.7-9.9 mg 100 g −1 FW) were in agreement with endive. The content increase in Y2 may relate to their functions in preserving membrane integrity and in ROS scavenging during stresses affecting cell osmosis 39 .

Features of endive enhanced transcriptome and annotation
The lettuce genome sequence 40 was helpful to guide the assembly and enhance a previous transcriptome of endive 19 to generate a new version (v.2) through a merging strategy (the detailed pipeline is in the materials and methods); Table 2 reports evaluation metrics for each assembly. The v.2 consisted of 49058 sequences characterized by mean contig length, N50 and N90 values of 1439, 1795 and 804 bp, respectively. Moreover, v.2 had a higher proportion of sequences longer than 1000 bp, increased completeness (92.4% vs 89.8%) and single-copy genes (80.6% vs 65.6%), decreased ratio of duplicated (10.8% vs 24.2%), fragmented (2.5% vs 3.9%) and missing (4.9% vs 6.3%) sequences than v.1. In addition, v.2 had the highest value of full-and nearly full-length sequences (37.7% and 72.9%) followed by lettuce genome-guided assembly (LGA) and v.1 (18.7 and 42.9%; 16.7 and 41.7%, respectively). The v.2 metrics outperformed LGA and v.1, except for the read mapping-back ratios that were only slightly lower than v.1. Referring to the lettuce genome, ca. 70% of v.2 transcripts aligned for over 70% in length (Table S7), reflecting the species relatedness and over 98% of mapped transcripts were on the nine lettuce chromosomes (<1.5% on unplaced scaffolds). Eventually, v.2 functional annotation included gene ontology terms for 35658 transcripts (Table S7). The approach of knowledge transfer from model (lettuce) to non-model (endive) species to reconstruct transcriptome of the latter was successful to by-pass, together with the use of de novo assemblers, bottlenecks (e.g. fragmentation, redundancy and chimerism) that affect ortholog/paralog resolutions, gene-expression quantifications and distance matrix construction 41 . Ultimately, the merging strategy was the best to minimize contigs redundancy and produce a higher quality assembly.

Analysis of differentially expressed genes (DEGs)
Cultivar-dependent transcriptomic variation is reported by comparison of Y2 vs Y1 (Fig. 2). Overall, the number of genes that varied expression was higher in curly than smooth types (6535-8151 genes in Myrna and Domari; 1274-3500 in Flester and Confiance). Specifically, 4005, 4157, 2279 and 541 genes were upregulated and 2530, 3994, 1221 and 733 were downregulated respectively in D, M, C and F (Fig. 2a). The four cultivars differed for the number of private DEGs (underlined values in Fig. 2b and c) supporting the activation of cultivar-specific pathways in stress response; Flester showed the DEGs lowest number, which might reflect either less susceptibility/rapid recovery to waterlog, consistently with the lesser weight loss than the other cultivars. Totally, 384 transcripts showed a conserved differential expression pattern independently of genotype (163 up-and 221 downregulated, Fig. 2b and c), and they were named core-DEGs. Gene ontology (GO) enrichment analysis revealed that the core-DEGs were over represented in 182 terms (Table S6) from which the top 20 biological processes were pictured in Fig. 2d. As lettuce and endive share common pathogens, the transcriptomic analyses of endive orthologs to lettuce responding to infections (Table S7) were addressed and the comparison between our data with those of transcriptome variations in lettuce-pathogen interactions did not bring out the response of genes 3,42 or GO terms typically associated to disease 43 . Hence, the data reinforced that stress sources other than waterlog were minimal and waterlog was likely to be the prevailing cause of the "omic" differences, and consistently, many enriched GO terms of endive recur in leaf transcriptional responses of waterlogged plants 8,44 . Indeed, DEG-enriched terms about ethylene, jasmonate, and ABA typically characterize waterlog-induced hormonal responses 8 . For example, the seven responsive DEGS here found included the  . Moreover, considering that photosynthesis inhibition is accompanied by the accumulation of reactive oxygen species, the top enriched "oxidationreduction process" term (GO:0055114, Fig. 2d) may further reflect the association of the two events in stressed endives. Finally, the transcriptional reprogramming of genes that regulate sugar metabolism 44 in waterlog-injured plants is here sustained by the scoring of carbohydrate and sucrose metabolic processes terms (GO:0005975 and GO:0005985), which are mostly characterized by sugar gene downregulation together with carbohydrate loss in stressed endives ( Fig. 2d and Table S6).

Genes and proteins of sucrose metabolism and the route to inulin
The drop of SUC/KES/INUL contents was common to all cultivars in Y2 (Table 1) and the transcriptional variations subtending this event were addressed by assembling the sucrose branch to fructans inclusive of sugar transporters (Fig. 2e). The route to sucrose was accomplished with transcripts of seven major enzyme classes: six hexokinases (HK), five fructokinases (FK), three UTP-glucose-  (Table S7), the endive 1-SSTs transcripts mapped on chromosome (chr) 9 and 2, the 1-FFT was on chr2, the 1-FEHs were close but spatially separated on chr5. These data suggest they were all distinct alleles rather than spliced forms of the same gene.

Gene modules acting on the inulin route
Gene regulation underlying carbohydrate content variations were investigated through a weighted gene co-expression network analysis (WGCNA) using carbohydrate metabolic process (CMP, GO:0005975) and (see figure on previous page) Fig. 2 Differentially expressed genes (DEGs) and inulin pathway in waterlogged cultivars. a Histogram reporting the total number of DEGs in Domari, Myrna, Flester and Confiance. b-c Venn diagrams of private and shared DEGs among the four cultivars. The numbers of up-(b) and downregulated (c) genes are reported. Cultivar-specific DEGS are underlined; genes with common expression trends fall in overlapping areas, core-DEGs were in bold (163 up-and 221 downregulated genes that conserved differential expression pattern independently of genotype). d Top 20 GO terms for the biological process category. False discovery rates were calculated. *, **, *** = significant at P ≤ 0.05, 0.01 and 0.001 respectively. e Genes of sucrose and fructan metabolism. Differential expression patterns in Domari, Myrna, Flester and Confiance in Y2 vs Y1 (D2vsD1, M2vsM1, F2vsF1 and C2vsC1) were highlighted by heatmaps according to log 2 fold change. Dashes indicate transcripts not expressed in leaves. Glucose-1P glucose 1-phosphate; Glucose-6P, glucose 6-phosphate, UDPG uridine diphosphate glucose, 1-FEH fructan exohydrolase, 1-FFT fructan:fructan-1 fructosyltransferase, 1-SST sucrose:sucrose-1 fructosyltransferase, FK fructokinase, HK hexokinase, INV invertase, pGlc-T plastidic Glucose translocators, SPP sucrose-phosphate phosphatase, SPS sucrose-phosphate synthase, SS sucrose synthase, STP sugar transporter protein, SUT sucrose transporter, SWEET Sugars Will Eventually be Exported Transporter, TMT tonoplast monosaccharide transporters transcription factor (TF) genes (Table S7). Overall, six modules of co-expressed genes were identified (Fig. 3a) and organized into two large meta-modules (Fig. 3b) based on correlation relationships and named Meta1 (yellow, blue and brown modules) and Meta2 (turquoise, black and green). The modules showed a positive correlation within each meta-module and a negative correlation with those of the other meta-module. Moreover, Meta1 and Meta2 showed respectively negative and positive correlations with sugar amounts (Fig. 3c). Focusing on relationships between modules and carbohydrate contents, the highest positive correlation occurred between the turquoise one and kestose (r = 0.78, p = 1 × 10 −5 ). The turquoise module consisted of 514 transcripts (Table S8) and included 1-SST (CICEN011043.1), 1-FFT (CICEN014948.1) and 1-FEHI (CICEN009400.1). The significant positive correlation between 1-SST and 1-FFT and KES was previously observed in chicory 51 . The blue module showed the maximal negative correlation with KES (r = −0.77, p = 5 × 10 −5 ) and with the turquoise module (r = −0.89, p = 2 × 10 −10 ). The blue module consisted of 406 genes, including the chicory orthologues of CiMYB5 (CICEN021640.1) and CiMYB3 (CICEN012065.1), which are R2R3-MYB factors known to control 1-FEH, 1-SST and 1-FFT in stress response 52 . Those transcripts with the highest correlation with KES (Fig. 3d), which were parametrized by gene significance (GS), showed the maximal centrality within the module as measured by module membership (MM). To downsize the number of candidate genes in KES regulation, transcripts of blue and turquoise modules were filtered by the simultaneous occurrence of highest GS, MM and intramodular connectivity. The filtered transcripts formed the module genes of interest, named MGI. The turquoise-MGI consisted of 21 TFs and 10 genes of the carbohydrate metabolic process (Table S8) and this latter included 1-SST. The positive correlation between turquoise-MGI vs KES contents and a prevailing trend of gene downregulation in waterlogged endives suggest that the module has an inductive role in KES accumulation. Conversely, the blue-MGI, which harboured 13 TFs and negative correlation vs KES amounts, showed a main gene upregulation that suggests a repressive function. Finally, the complexity of TF and carbohydrate genes relationships were depicted for blue-and turquoise-MGI in Fig. 3e.

Computational prediction of TFs controlling 1-SST
The 1-SST repression strongly associated with KES content drop in waterlogged endives (Table S8), hence a transspecific computational approach was used to identify candidate TFs (Table 3) acting on CICEN011043.1/1-SST. Given that the endive transcriptome just contains transcribed DNA, the promoter regions of 1-SST from chicory and lettuce were used to score conserved binding sites (BS) known to be targeted by various TF classes, and those included in the blue and turquoise MGIs were reported. After filtering for BS n. ≥10 in the promoter of chicory 1-SST, the best candidates were GATA (CICEN007985.1/GATA9 and CICEN020509.1/GATA15) and DOF (CICEN019145.1/CDF3 and CICEN025406.2/ DOF3.4) families within the turquoise module, and WRKY types (CICEN030208.1/WRKY6 and CICEN011950.1/ WRKY75) in the blue group.

Conclusions
Rainfall excess induced waterlog and affected the yield of four endive cultivars, which showed different stresssensitivity, with the smooth-leafed 'Flester' having the lowest weight-loss. The NMR untargeted profiling enriched nutritive dataset with previously unreported compounds, was effective to highlight metabolic variations due to G, Y and GxY interaction and to address the inulin pathway. The enhancement of endive transcriptome allowed the identification of cultivar-specific (minimal in the least affected 'Flester') and cultivar-independent DEGs. These latter were enriched in GO terms consistent with those of leaves of waterlogged crops, and inclusive of carbohydrate metabolic process. The narrow down on the sucrose-inulin branch showed that the lowered expression of a sucrose gene set in parallel with that of 1-SST accounted for sucrose and kestose contents drop of stressed leaves. WCGNA identified two anticorrelated modules harbouring candidate hub-genes, including 1-SST, that may control positively and negatively kestose levels. BS computational analysis further supported that GATA, DOF and WRKY TFs might control 1-SST.

Plant material, growth conditions, and sampling
Curly-leafed 'Domari' and 'Myrna' and smooth 'Confiance' and 'Flester' cultivars (Enza Zaden Italy s.r.l.) respectively belong to Cichorium endivia var. crispum and latifolium; cultivations took place on the same parcel at comparable periods (September to November) in 2011 and 2012 with standardized interventions (Table S1). Soil characteristics are in Table S1, while soil humidity variation was monitored by tensiometers (mod. 8060, −60 to 0 kPa, probe length 30 cm by Stelzner/Pronova, Germany). Briefly, tensiometers were inserted (in pre-drilled holes) between 15 and 25 cm of depth and three measurements per week were carried out in three different points of the parcel hosting each cultivar. Figure 1 combines data on soil humidity, rainfall, relative humidity and temperature; the meteorological dataset was available on the public service web (http://dati.lazio.it/catalog/dataset/ serie-storica-agrometeo). As for sampling, nine endive heads of each cultivar were selected according to market standards and weighted. Ten leaves (assumed as the target  Table S8) are above the circles (transcription factors) and squares (carbohydrate genes). TF putatively binding 1-SST are in red of consumption) were cut from each rosette and mixed to form a cultivar-specific pool that was subdivided into three distinct replicate batches (RB) of thirty leaves each. The RBs (of comparable weights) were frozen by liquid nitrogen, softly hand-crunched, and stored at −80°C. Aliquots from each RB were directly used for RNA isolation or lyophilized at −50°C for 72 h (lab freeze dryer, FreeZone®, Labconco Corp., Kansas City, MO, USA) and stored at −20°C for NMR quantification.

Metabolite extraction, NMR assignment and profiling
Lyophilized leaves were finely ground in liquid nitrogen using pre-cooled ceramic pestle and mortar. An amount of 25 mg was added to 0.90 mL of acetonitrile/water (1:1 v/v), the mixture was stirred for 30 s, centrifuged for 5 min (14,500 × g) and supernatant (0.74 mL) filtered through cotton wool in a glass vial. After solvent evaporation (N2 flux at room temperature), the residue was dissolved in 0.75 mL of 400 mM phosphate buffer (pH = 7) in D2O containing 1 mM 3-(trimethylsilyl)-propionic-2,2,3,3-d4 acid sodium salt (TSP) as an internal standard. The NMR spectra of aqueous extracts were recorded at 27°C on a Bruker AVANCE 600 NMR spectrometer operating at the proton frequency of 600.13 MHz. TSP signal of methyl group (d = 0.00 ppm) was used as an internal standard for 1H spectra. Each 1H spectrum was acquired by co-adding 256 transients with a recycle delay of 3 s. The residual HDO signal was suppressed using a pre-saturation. The experiment was carried out by using a 45°pulse of 7.0 µs, 32 K data points. All the spectra were processed by means of the Bruker TOPSPIN software (version 1.3). After Fourier transformation, manual phase correction and baseline correction selected resonances in 1H NMR spectra (Table S3) were integrated to calculate metabolite concentrations. The integral value of TSP methyl groups (9H) was used as a reference for quantification. The content of selected metabolites was expressed as in mg g -1 on dry weight basis. 2D NMR experiments, namely 1H-1H total correlation spectroscopy (TOCSY), 1H-13C heteronuclear single quantum coherence (HSQC), and 1H-13C heteronuclear multiple bond correlation (HMBC), were performed using the same experimental conditions previously reported 53 . The mixing time for the 1H-1H TOCSY was 80 ms. The 1H-13C HSQC experiment was performed using a coupling constant 1JC-H of 150 Hz, whereas the 80 ms delay for the evolution of long-range couplings was used in 1H-13C HMBC experiments.
Carbohydrate contents were converted into SSE by using conversion values from literature (Table S4).

Transcriptome assembly, annotation and identification of DEGs
The Lactuca sativa draft assembly (Lsat_Salinas_v7, GCA_002870075.1) was used as a reference genome to guide the endive transcriptome assembly. RNA from apices, stems, leaves, and roots of Domari cultivar at transplant and commercial maturation stages were sequenced as previously described 19 . The paired-end reads (NCBI-SRA: SRX3385280) were aligned to the genome by STAR v.2.7.0e (https://github.com/alexdobin/ STAR/releases), partitioned according to locus, and assembled at each locus by Trinity v.2.8.2 (https://github. com/trinityrnaseq/trinityrnaseq/). Subsequently, the For each cultivar, differential gene expression analysis between Y2 vs Y1 was carried out with the Bioconductor edgeR package and transcripts with false discovery rate (FDR) ≤ 0.05 and an absolute log 2 fold change ≥1 were defined as DEGs. Quantitative PCR (qPCR) and normalization procedures were previously detailed 19 by using the primers listed in Table S9.

Network analysis
A co-expression gene network of TF and carbohydrate metabolic process (GO:0005975) genes was constructed using the WGCNA software package v1.68 (https:// horvath.genetics.ucla.edu/html/CoexpressionNetwork/ Rpackages/WGCNA/). We filtered out transcripts that were lowly expressed (RPKM < 1) in more than 70% of the libraries and had incomplete sequences (full-length ratios <80%) as compared to the respective NCBI blast hits; 2129 genes that satisfied these thresholds were retained. Normalized and log-transformed gene expression data were corrected for batch effect and the adjusted values were used for signed network construction and module detection. Briefly, soft thresholding power ß of 22 was chosen based on a scale-free topology approximation criterion, adjacency and dissimilarity based on topological overlap were computed. As for the detection of co-expressed gene clusters (modules), the Dynamic Tree cut algorithm (minimal module size of 30) and a branch merge cut height of 0.25 were used. The first principal component of each module (module eigengene, ME) was used to condense the gene expression variability within each module. MEs were exploited to study higher-order relationships among the modules and to relate gene co-expression patterns to carbohydrate amounts. The correlations between individual gene expressions and carbohydrate contents were reported as gene significance (GS) values. The correlation between a gene expression profile vs the ME (module membership, MM) was used to quantify how close a gene is to a given module. The sum of the adjacencies within a given module (intramodular connectivity, Kin) was calculated to measure the coexpression between a gene and other module members. As for the most significant modules vs kestose contents ('Blue' and 'Turquoise' modules), genes with the highest (top 10%) GS, MM and Kin were selected and those with a connection strength (edge weight) ≥0.2 were visualized with Cytoscape.

Phylogenetic analyses and computational search of binding sites in gene promoters
Phylogenetic analysis was performed based on Cichorieae (C. endive, C. intybus and L. sativa) deduced proteins belonging to the glycoside hydrolase family 32 (GHF32). Fully-and nearly full length amino acid sequences from endive were selected those bearing the glyco_hydro_32N (PF00251) and glyco_hydro_32C (PF08244) domain annotations. Complete protein sequences from C. intybus and L. sativa were retrieved from NCBI protein db (access numbers are in Fig. S3 legend). ClustalW was used in multiple sequence alignments and the MEGA software (v. 7) running the neighbour-joining method allowed phylogenetic analyses and tree assembly. As for transcription binding site analysis, the 1070 base pair long genomic sequence upstream the ATG of the chicory (GenBank: EU545648.1) and lettuce (GenBank: CM022519.1, base location: 204758200..204759269) 1-SST orthologues were analyzed by PLANTPAN 3.0 (http://plantpan.itps.ncku. edu.tw/).

Statistical analyses
All data (three biological and three analytical replicates) were analyzed according to a completely randomized design in a two-way ANOVA (genotypes × year of cultivation) by R studio script. The separation of means was obtained by Least Significant Difference (LSD) test. For visual analysis of the data, principal component analysis (PCA) was performed on mean centred and standardized data (unit variance scaled). The data matrix submitted to PCA was made of 8 observations (2 cultivation years × 4 genotypes) and 30 variables. The results were shown as biplots of scores (treatments) and loadings (variables) using R studio script. Pearson correlations were calculated using the "rcorr" function in the Hmisc package within the R environment (v 3.4.3).