Ash leaf metabolomes reveal differences between trees tolerant and susceptible to ash dieback disease

European common ash, Fraxinus excelsior, is currently threatened by Ash dieback (ADB) caused by the fungus, Hymenoscyphus fraxineus. To detect and identify metabolites that may be products of pathways important in contributing to resistance against H. fraxineus, we performed untargeted metabolomic profiling on leaves from five high-susceptibility and five low-susceptibility F. excelsior individuals identified during Danish field trials. We describe in this study, two datasets. The first is untargeted LC-MS metabolomics raw data from ash leaves with high-susceptibility and low-susceptibility to ADB in positive and negative mode. These data allow the application of peak picking, alignment, gap-filling and retention-time correlation analyses to be performed in alternative ways. The second, a processed dataset containing abundances of aligned features across all samples enables further mining of the data. Here we illustrate the utility of this dataset which has previously been used to identify putative iridoid glycosides, well known anti-herbivory terpenoid derivatives, and show differential abundance in tolerant and susceptible ash samples.

European common ash, Fraxinus excelsior, is currently threatened by Ash dieback (ADB) caused by the fungus, Hymenoscyphus fraxineus. To detect and identify metabolites that may be products of pathways important in contributing to resistance against H. fraxineus, we performed untargeted metabolomic profiling on leaves from five high-susceptibility and five low-susceptibility F. excelsior individuals identified during Danish field trials. We describe in this study, two datasets. The first is untargeted LC-MS metabolomics raw data from ash leaves with high-susceptibility and low-susceptibility to ADB in positive and negative mode. These data allow the application of peak picking, alignment, gap-filling and retentiontime correlation analyses to be performed in alternative ways. The second, a processed dataset containing abundances of aligned features across all samples enables further mining of the data. Here we illustrate the utility of this dataset which has previously been used to identify putative iridoid glycosides, well known anti-herbivory terpenoid derivatives, and show differential abundance in tolerant and susceptible ash samples.

Background & Summary
The globalisation of trade and logistical biosecurity challenges at port of entry has led to an increasing number of alien species invading countries where they have often adapted to new environments and infected exotic flora. Furthermore, climate change is likely to modify plant pathogen profiles, further contributing to emerging pathogens 1 . Most visible and socially impactful are tree pathogens, which have the potential to dramatically modify the landscapes of countries or even continents. Recent examples include Dutch elm disease which began in the UK and chestnut blight in the USA 2 . Today, European common ash (Fraxinus excelsior) is currently threatened by Ash dieback (ADB) which was first reported in the early 1990's in north-eastern Poland 3 and from where it has rapidly spread across Europe [4][5][6][7][8] . ADB, caused by the fungus Hymenoscyphus fraxineus, is currently found in most European countries, and was confirmed in the UK and Ireland in 2012. In the UK alone, where in excess of 100 million trees are at threat, it is estimated that over 1,000 species rely on ash trees for all or part of their lifecycle, including wood mice, squirrels, bullfinches, wrens, bats and beetles. Forty five of these species are considered obligate 9 . The causal agent of ash dieback, H. fraxineus most probably originated from East Asia. Hosoya et al. 10 reported the presence of the fungus 10,11 in Japan on its native host Manchurian ash (Fraxinus mandshurica) where it exhibits a hemi-biotrophic lifestyle 12,13 . By contrast, on a range of exotic hosts, not just exclusively Fraxinus spp. but extending into other Oleaceae, H. fraxineus displays a necrotrophic lifestyle. Interestingly, emerald ash borer (Agrilus planipennis), a native Chinese buprestid beetle, has devastated tens of millions of Fraxinus americana (American ash) in the USA 14 . Yet, both A. planipennis and H. fraxineus appear to co-exist with Manchurian ash in their native habitats. The recent discovery of a genetic basis to ADB susceptibility provides the opportunity for selective breeding approaches 15-18 that will facilitate the identification and propagation of superior ash trees. Tree breeding programmes have been established to select tolerant/resistant germplasm. Nevertheless, such programmes take many years to establish, ideally require access to replicated field trials, and need large breeding population sizes to avoid loss of rare alleles and maximisation of long-term commercial traits and may not reveal the genetic background behind the resistance. An ideal solution would be to have robust markers to assist in selective breeding for ADB and recent progress on whole genome sequencing 19 and association transcriptomics 20 will facilitate identification of molecular markers for assisted ash breeding and help provide a molecular understanding of the tolerance identified by field testing. However, these studies are confounded by the high degree of genetic heterozygosity due to extensive outcrossing, making the identification of genetic markers challenging. Moreover, breeding value assessments of Danish tolerant 'Tree 35' concluded that resistance was quantitative and that there was no evidence to suggest that resistance to ADB operating in F. excelsior was a consequence of a single or a few resistance genes (qualitative resistance), thus race specificity is unlikely 21 .
Despite the extensive genetic heterogeneity in European F. excelsior, ADB tolerant trees are increasingly being reported across Europe, e.g., Havrdová et al. 22 and Muñoz et al. 23 . Given (i) the evidence for quantitative resistance, (ii) that resistance mechanisms of deciduous trees are thought to include a combination of constitutive and induced chemical defences 21 and (iii) as wind borne ascospores are present for a sustained period over the summer, the mechanism underpinning foliar infection tolerance to ADB is highly likely to have a major constitutive component. Based upon this knowledge we undertook an unbiased global metabolic profiling of ash. We used material from the highly advanced Danish study that identified 'Tree 35' 21 . We first tested whether the methodological approach could discriminate tolerant and susceptible ash. Then, using a second independent set of individuals, we undertook a detailed metabolomics profiling of these Danish ash. Here we provide our untargeted LC-MS metabolomics raw data from ash trees with high-susceptibility and low-susceptibility to ADB, a processed dataset containing abundances of aligned features across all samples, and use this dataset to highlight a family of small molecules from the iridoid glycoside class, that discriminate tolerant and susceptible ash. We demonstrate that iridoid glycosides, well known antifeedant molecules, are markedly reduced in tolerant ash leaves. The ecological and breeding implications of this is intriguing, as it implies breeding for ADB tolerance may unintentionally confer enhanced susceptibility to emerald ash borer. Given its presence in Russia, and hence representing an emergent threat to Europe, these findings certainly warrant further investigation.
Our study provides a wealth of potential information for future investigation and are the only set of replicated, unbiased LC-MS metabolite data from verified tolerant and susceptible European ash and these are available with associated metadata in Metabolights 24,25 [MTBLS372]. Thus different data processing algorithms can be applied to reuse these data. As a striking 25% of the ash genome encoded unique (orphan) genes 19 , we see genuine utility in using this dataset to facilitate metabolite predictions to support studies aimed at identifying gene function. While we have focussed on the iridoid glycosides, the processed dataset contains features that discriminate tolerant trees and may be developed into rapid screens for ADB tolerant ash, e.g., in breeding trials.

Methods
These methods are expanded from descriptions previously published in Nature 19 . A schematic overview of the methods are shown in Fig. 1.

Collection of leaf material
Origin of genotypes used: Leaf material was sampled from trees with very different levels of susceptibility to ADB. Five trees were selected among the individuals exhibiting very low levels of symptoms when subject to natural infection of H. fraxineus: R14164C (HGH-A), R14184A (HGH-B), R14193A (HGH-C), R14198B (HGH-D), R14181 (HGH-E) and five trees were selected among trees exhibiting severe symptoms: R14127 (UGH-F), R14120 (UGH-G), R14169 (UGH-H), R14156 (UGH-I), and 25UTaps (UGH-J). All sampled trees were considered unrelated.
The R-trees were selected from a genetic field trial established in 2004 near Randers, Denmark (N56°5 0, E10°04). The trial comprised 2,448 trees derived from seed collected in 2001 from 101 mature Danish trees. The progenies were monitored annually for symptoms of natural infections from 2008-2011. Material used in this study was derived from selections made in 2012, by which time mortality had increased to 68%, based upon symptom presentation 15 .
The 25UTaps tree was selected among 39 clones based on the average level of crown damage assessed between 2007-2011 across the two Danish test sites at Tuse Naes (N55°45, E11°42) and Tapse (N55°24, E9°27). These represent mature trees selected in Denmark between 1934-1944 (except 25UTap and 27UTaps selected in 1994) and grafted on F. excelsior rootstock before establishment in a clonal trial 16 . The level of natural infections in both clonal trials was very high resulting in substantial mortality although all 39 genotypes survive as the clones were originally established with >50 replications (ramets) per clone.
ADB damage of the selected trees was re-assessed in trials in June 2013 and 2014, using the average score to characterise their level of susceptibility (Table 1). Three R-trees sampled as highly susceptible in 2012 were dead by 2013 and by 2016 all susceptible R trees were dead in Randers. 25UTaps is alive and still present in the clonal trials (Tuse Naes and Tapse). All healthy trees retained a 0-10% damage score in 2016, except for R14181 (HGH-E), which had some crown damage (25-50%).
Scions were collected from the trees in either Randers or Tapse during January and February 2013, grafted onto rootstocks of F. excelsior seedlings, potted and placed in a greenhouse at University of Copenhagen. Leaves were sampled in September 2014. All plants were symptom free at the time of sampling. Three leaves were sampled from each graft, flash frozen in liquid nitrogen and stored at −80°C.

Sample processing
In order to understand whether ash trees with low and high susceptibility to ADB vary in their metabolite profiles as well as their transcriptomes, we undertook untargeted metabolite profiling on a subset of trees of Danish origin from a genetic field trial (R-trees) 15 and a test panel (25UTaps) 16 . Untargeted metabolomics has not previously been applied to natural populations but has the potential to identify small molecules (or small molecule associations) that directly contribute to tolerance or resistance, particularly given limited evidence for qualitative resistance to ADB and that deciduous trees appear to deploy a combination of constitutive and induced chemical defences. We compared triplicate samples from five low-susceptibility Danish trees (R-14164C, R-14184A, R-14193A, R-14198B and R-14181) and five high-susceptibility trees (R-14169, R-14127, R-14156 R-14120 and 25UTaps). Three leaflets from each triplicate sample were freeze dried and gently crushed to mix tissue types. Approximately 100-150 mg of this material was ground to a fine powder in a 2 ml polypropylene microfuge tube using a TissueLyser (Qiagen; 2 × 1 min at 25 Hz). 10 mg was extracted in 400 μl 80% MeOH containing d5-IAA internal standard at 2.5 μg/ml ([ 2 H5] indole-3-acetic acid; OlChemIm Ltd, Czech Republic), centrifuged (10,000 g, 4°C, 10 min) and the pellet re-extracted in 80% MeOH. The pooled supernatants were filtered through a 0.2 μm PVDF syringe tip filter (Chromacol, Thermo Scientific, MA, USA).

Mass spectrometry
These leaf extracts were analysed using a Polaris C18 1.8 μm, 2.1 × 250 mm reverse phase analytical column (Agilent Technologies, Palo Alto, USA). 5 μl samples were resolved on an Agilent 1200 series Rapid Resolution HPLC system coupled to a quadrupole time-of-flight QToF 6520 mass spectrometer (Agilent Technologies, Palo Alto, USA). Mobile phases were as follows: positive ion mode; mobile phase A (5% acetonitrile, 0.1% formic acid), mobile phase B (95% acetonitrile with 0.1% formic acid). Negative ion mode; mobile phase A (5% acetonitrile with 1 mM ammonium fluoride), mobile phase B (95% acetonitrile). The following gradient was used: 0-10 min-0% B; 10-30 min-0-100% B; 30-40 min-100% B. The flow rate was 0.25 ml min − 1 and the column temperature was held at 35°C throughout. The source conditions for electrospray ionisation were as follows: gas temperature was 325°C with a drying gas flow rate of 9 l min − 1 and a nebuliser pressure of 35 psig. The capillary voltage was 3.5 kV in both positive and negative ion mode. The fragmentor voltage was 115 V and skimmer 70 V. Scanning was performed using the auto MS/MS function at 5 scans sec − 1 for precursor ion surveying and 4 scans sec − 1 for MS/MS with a sloped collision energy of 3.5 V/100 Da with an offset of 5 V. Scan speed varied based on precursor abundance with a maximum of 1.15 s between MS and final (5th) MS/MS (in practice this was never observed to exceed 0.8).

Data processing
Positive and negative ion data (centroid) were converted into mzData using the export option in Agilent MassHunter. Peak identification and alignment was performed using the Bioconductor R package XCMS 26 and features were detected using the centWave method 27 for high resolution LC/MS data in centroid mode at 30 ppm. The samples were grouped into 'tolerant' and 'susceptible' folders/groups before processing and all samples were aligned and peaks identified in a single batch. Changes to the default parameters were: mzdiff = 0.01, peakwidth = 10-80, noise = 1,000, prefilter = 3,500. Peaks were matched across samples using the density method with a bw = 5 and mzwid = 0.025 and retention time correlated using the obiwarp algorithm 28 with profStep = 0.5. Missing peak data was filled in the peaklists generated from the ADB low susceptibility ash leaf samples compared to the peaklists generated from the ADB susceptible leaves. The resulting peaklists were annotated using the Bioconductor R package, CAMERA 29 . The peaks were grouped using 0.05% of the width of the full width at half maximum (FWHM) and groups correlated using a P-value of 0.05 and calculating correlation inside and across samples. Isotopes and adducts were annotated using a 10 ppm error.  Statistics Statistical analysis and modelling was performed using MetaboAnalyst v3.028 30,31 , with the following parameters. Missing values were replaced using a (K-nearest neighbour) KNN missing value estimation. Data was filtered (40%) to remove non-informative variables using the interquartile range (IQR). Samples were normalised using the internal standard d5-IAA (POS: M181T1448; NEG: M179T1382). Data was auto-scaled. Peaks from the three replicates, run in positive and negative mode were aligned with XCMS and features tested for practical significance to determine the differences between the tolerant and susceptible genotypes. In addition, PLS-DA was performed using MetaboAnalyst 3.0 allowing clear discrimination of tolerant and susceptible genotypes based on their metabolic profiles (Fig. 2a).
The individual features (putative metabolites) that contribute to the separation between the different classes were further characterised. We first applied a range of univariate and multivariate statistical tests to determine the importance of these features. This included variable influence on the projection (VIP) values derived from PLS-DA scores, practical significance, t-test, P-value, Benjamini and Hochberg FDR (False Discovery Rate) adjusted P-value, effect size, Random Forest analysis and MS/MS fragmentation network analysis. For example, using Random Forest, significant features were ranked by mean decrease   (Fig. 2b). Features that were shown to be of interest using the above statistical tests were individually checked by returning to the raw data and determining if they had been properly identified and aligned by XCMS. We also checked that those features of interest were absent in the blanks and the extraction buffer only samples.

Putative feature identification
Putative identification of features was performed using several approaches. Each feature of interest was analysed using MassHunter molecular formula estimation. This information, along with accurate mass and MS/MS spectra were used to interrogate existing literature, and databases including, but not limited to, KNApSAcK (http://kanaya.naist.jp/KNApSAcK/), Metlin (https://metlin.scripps.edu), ReSpect (http:// spectra.psc.riken.jp/), PubChem (https://pubchem.ncbi.nlm.nih.gov/), ChemSpider (http://www.chemspider.com/), mzCloud (https://www.mzcloud.org/) and Massbank (http://www.massbank.jp/). Additionally, features of interest identified in negative mode were extracted from MassHunter as CEF (compound exchange format) files. These were imported into MSC and searches against ChemSpider and PubChem databases and a custom database using mol files generated from papers. Searches were performed with default parameters apart from the minimum MSC score was set to 50.0. The following parameters were used: MS/MS isolation window was set to first isotope only with default ionization set to protons. Multiple C.E. treatment was set to simple average. Only the 30 most abundant ions were used and formulae contained in the data files was used (i.e., from MassHunter Molecular Formula Prediction). Elements used for MFG composition were H: 2-1,000, C: 1-1,000, N 0-1,000, O:0-1,000 and S: 0-1,000. Height uncertainty was set to 7.5%. Only a representative tautomer was set, the number of total rings and the number of aromatic rings were unconstrained.    N5. N5 has a similar fragmentation pattern and retention time to N3 with a mass difference of 384.12 (i.e., C 21 H 20 O 7 ). The abundant 403 fragment has the same mass as Oleoside 11-methyl ester/Oleoside 7methyl ester and shares a fragmentation fragment (223) with these isomers suggests a glycosylated dimer of Oleoside 11-methyl ester or Oleoside 7-methyl ester (((404 × 2) − 1)+162 = 969) 35 .  22 . This is the molecular formula of Jaspolyanoside identified in Jasminum polyanthum (Oleaceae) 38 and Olea europaea (Oleaceae) 39 . This compound is an aromatic conjugated secoiridoid glucoside and the loss of 162 Da (933 → 771) is consistent with it being a glucoside.

Data Records
For this study, we submitted the raw untargeted LC-MS metabolomics data for three replicates of 5 susceptible and 5 tolerant ash dieback genotypes of Danish F. excelsior run in both positive and negative mode to MetaboLights 24,25 (Data Citation 1). Additionally, the XCMS alignment files were deposited (Data Citation 1). The datasets described (Data Citation 1) were previously published in our related work in the journal Nature 19 . negative mode by using the peak areas of all features reported by XCMS after missing values were imputed by MetaboAnalyst (Table 3).

Statistics
Several methods were used to validate discriminatory features including equivalence testing, PLS-DA and random forest analysis as described in the methods section.

Putative feature identification
Several approaches were taken to validate the putative identification of discriminatory features. An MS/ MS fragmentation network was generated after extracting the m/z of the MS/MS product ions (where abundance >5% and m/z >100) from the 13 discriminatory features identified in positive and negative mode using MassHunter Qualitative Analysis Version 4 and visualized using Cytoscape 40 indicating product ion masses which have been previously reported from fragmentation of iridoid glycosides 41 . Further validation was performed through a mirror plot (Fig. 3) comparing the MS/MS spectra of four features (N2-5) detected in negative mode with an ESI-TOF/IT-MS spectra of elenolic acid glucoside taken from the literature 42 . Finally, the accurate mass of MS/MS product ions from four discriminatory features identified in negative mode (N2-N4) were correlated with the structure of the putatively identified feature using MassHunter Molecular Structure Correlator (Agilent) (Fig. 4). Identification was not possible for those features with no fragmentation data, or lacking significant supporting adducts. Many additional features of interest were identified but require further validation to allow confident attributions, while some features did not provide fragmentation patterns. We thus restricted further identification and characterisation to discriminatory features from the iridoid glycoside class of compounds. These iridoid gylcosides, which have been previously documented in Oleaceae, are summarised in Supplementary Table S1 and Supplementary Fig. 1. Fraxinus excelsior is a member of the Olive family (Oleaceae), therefore data reported for this family were used to aid identification. To further validate our identifications we used an MS/MS fragmentation network approach and graphically illustrate the resulting network in Fig. 2b.
Based on ESI fragmentation patterns, the majority of features identified as having a practical significant difference between tolerant and susceptible genotypes of F. excelsior are likely to be from the same class of compounds. Common fragment ions are evident in many of these discriminating features, including m/z 179, 223 and 403, which are indicative of elenolic acid glucoside related molecules. Molecular networking confirms that these features are likely to be from the same family of compounds and we provide a detailed putative identification of each feature that is represented by box plots and spectra ( Supplementary Fig. 1).

Usage Notes
The general approach for sample processing and data analysis is described in the schematic (Fig. 1). Links to the software, resources and repositories used are stated below.