A database of high-resolution MS/MS spectra for lichen metabolites

While analytical techniques in natural products research massively shifted to liquid chromatography-mass spectrometry, lichen chemistry remains reliant on limited analytical methods, Thin Layer Chromatography being the gold standard. To meet the modern standards of metabolomics within lichenochemistry, we announce the publication of an open access MS/MS library with 250 metabolites, coined LDB for Lichen DataBase, providing a comprehensive coverage of lichen chemodiversity. These were donated by the Berlin Garden and Botanical Museum from the collection of Siegfried Huneck to be analyzed by LC-MS/MS. Spectra at individual collision energies were submitted to MetaboLights (https://www.ebi.ac.uk/metabolights/MTBLS999) while merged spectra were uploaded to the GNPS platform (CCMSLIB00004751209 to CCMSLIB00004751517). Technical validation was achieved by dereplicating three lichen extracts using a Molecular Networking approach, revealing the detection of eleven unique molecules that would have been missed without LDB implementation to the GNPS. From a chemist’s viewpoint, this database should help streamlining the isolation of formerly unreported metabolites. From a taxonomist perspective, the LDB offers a versatile tool for the chemical profiling of newly reported species.


Background & Summary
In the 19 th century, lichens were still considered to be lower plants but around the time the general concept of symbiosis was being developed, they were found to be associations between a fungus and an alga 1,2 . It was soon unveiled that this original lifestyle was accompanied by a specific chemodiversity. As knowledge regarding lichen chemistry progressed, tailored analytical approaches to harness this specific chemistry were developed. Accordingly, during the middle of the twentieth century Asahina and Shibata performed microcrystallizations to visually identify lichens metabolites, and started elucidating their structures [3][4][5] . TLC was later introduced in an attempt to dereplicate more precisely lichen metabolites and standardized migration solvent mixtures were designed in 1970 by Culberson & Kristinsson 6 , facilitating chemical profiling and identification of species by comparison to co-eluted standards. Although lichen chemicals were historically used as taxonomic markers, the delimitation of species based solely on chemistry was treated with caution. Major compounds and their biosynthetically related minor satellite compounds were however investigated as a means to classify lichens in chemosydromes to better understand the evolution of closely related taxa through their chemistry, although biomolecular data is now more commonplace [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23] . Data accumulated during the 20 th century was summarized in 1996 by Huneck and Yoshimura in their compendium "Identification of Lichen Substances", comprising spectroscopic data for over 850 lichen chemicals, including TLC retardation factors, infra-red data, electron impact mass spectrometry signals, UV/visible spectra and NMR landmarks 24 . Even though a wide range of analytical techniques have been used to study lichen metabolites [25][26][27][28][29] , the favoured approach among lichenologists remains TLC analysis [30][31][32][33][34] . Because of its accessibility, it is still widely used today along with spot test reactions when describing 1 CNRS, ISCR (Institut des Sciences Chimiques de Rennes)-UMR 6226, Univ Rennes, F-35000, Rennes, France. 2  DaTa DeSCRIPToR oPeN three data-dependent MS/MS scans of the first, second, and third most intense ions from the first scan event. MS/ MS settings were the following: three collision energies for the negative mode (10, 25, 40 eV, and additionally 2.5, 5, and 7.5 eV for depsides), three for the ESI+ and APCI modes (5, 20 and 35 eV), default charge of 1, isolation width of m/z 2. Purine (C 5 H 4 N 4 , m/z 121.050873 (positive)), trifluoroacetic acid (CF 3 CO 2 H, m/z 112.98559, negative) and HP-0921 (hexakis(1H, 1H, 3H-tetrafluoropropoxy)-phosphazene C 18 H 18 F 24 N 3 O 6 P 3 , m/z 922.009798 (positive), 1033.988109 (negative, trifluoroacetate adduct) were used as internal lock masses. Full scans were acquired at a resolution of 10 000 (m/z 922) and 4000 (m/z 121) (positive polarity) and 10 000 (m/z 1033) and 4 800 (m/z 112).
Database constitution. 309 files in the d agilent format were thus generated and converted in an mzXML format using the MSConvert 60 module from Proteowizard 61,62 . Raw converted data were then treated using a custom script in the R 3.6.0 language 63 with the MSnBase package 64 to isolate MS/MS spectra at each collision energy for the given metabolite and save them as a one scan mzXML file. Additionally, a merged MS/MS spectrum was generated as a one scan mzXML file by averaging previously generated spectra for a given compound. Merged mzXML files were assembled in a single mgf file using a private online workflow provided by the GNPS platform and uploaded with a metadata file as an open access MS/MS database (available at https://gnps.ucsd.edu/ ProteoSAFe/gnpslibrary.jsp?library=LDB_POSITIVE for the positive mode spectra, and https://gnps.ucsd.edu/ ProteoSAFe/gnpslibrary.jsp?library=LDB_NEGATIVE for the negative spectra). Each spectrum was manually curated to assess the quality of the fragmentation and the identity of fragmented metabolites.
Database description and use in molecular network-based dereplication. The LDB contains 309 merged MS/MS spectra from 250 lichen metabolites ionized in Negative Ion mode ESI-MS (NI-ESI) (226 spectra), Positive Ion mode ESI-MS (PI-ESI) (68 spectra) and APCI (15 spectra). Additionally, the 1011 MS/ MS spectra at individual collision energies are available on MetaboLights. The available standards cover a huge majority of the scaffolds reported in the 1996 Huneck and Yoshimura compendium (Fig. 2a,b) and can therefore be expected to provide a valuable information for lichen chemical profiling given the high degree of structural  www.nature.com/scientificdata www.nature.com/scientificdata/ recurrence within these organisms. A further criterion for database convenience assessment deals with the ability to provide general data acquisition that encompass as wide as possible a diversity of the metabolites it comprises. The occurrence of acidic functions within a vast majority of lichen metabolites was incentive for their analyses in negative-ion mode, consistently with former reports 25 . While the main analysis parameters (viz. 10/25/40 eV collisions energies) afforded convenient tandem mass spectra for most structures investigated herein, two specific cases required further acquisition settings to be applied. At first, depsides underwent an extensive fragmentation under the aforementioned conditions. To remedy, further analyses using lower collision energies were carried out (i.e. 2.5/5/7.5 eV) on this specific structural class, which afforded overall better results than the former parameters. All six spectra related to depsides were implemented into the LDB. Besides, one should keep in mind that electrospray ionization mainly facilitates the formation of deprotonated molecules but poorly leads to radical ions 65 . Therefore, in specific molecular environments where phenolic groups can instigate intramolecular hydrogen bonds, their deprotonation is rendered impossible and the molecule is not amenable to NI-ESI-MS detection 66 without dedicated analytical procedures. As to lichen diversity, this configuration is mostly encountered in γ-pyrone-containing metabolites, i.e. quinones, xanthones and chromones, some of which could not be detected upon NI-ESI-MS analyses. However, irrespective of their detected/not-detected status in negative polarities, all of them provided satisfactory MS² spectra in PI-ESI-MS, which should be favoured when analyzing lichen species producing these structural series. At last, terpenes and steroids were analyzed using an APCI source which provides better results than ESI for low to medium-polarity compounds 67 , although Atmospheric Pressure Photo Ionization would probably shift more again towards non-polar compounds. The database is available on GNPS for molecular networking applications. Fig. 2c,d show two molecular networks generated with the spectra from the LDB as input and each node (or molecule) was colored according to the chemical scaffolds they represent, as described by Huneck and Yoshimura 24 (parameters in Table S1). Generally speaking, the negative-ion mode generated molecular network is bigger due to the much higher number of compounds from the LDB analyzed in this mode, which should prevail in lichenochemistry. The molecular network obtained from the positive-ion mode revealed a more limited clustering, which may be related to the subset of compounds analyzed in this ionization mode also being among the most scattered in negative polarity analyses. Overall, the obtained molecular networks tended to cluster according to compounds' structural classes. However, each scaffold did not produce a unique and homogenous cluster in the retained settings and, conversely, seemingly-unrelated metabolites sometimes came to cluster together as a consequence of their similar functional groups, thereby being prone to undergoing similar neutral losses. Structural features accounting for the topology of molecular networks are not usually worth being too thoroughly investigated since it exclusively depends on cosine threshold definition, the stringency of which considerably affects the obtained outcome. Nevertheless, some clustering behaviours could be rationalized in a straightforward manner. Illustrative chemical representatives of the discussed clusters are displayed in Fig. 2. As such, an interesting example is that of depsides (Fig. 2c, in cyan) which were split into three major clusters. The cluster #2 contained depsides with three to seven-membered alkyl side-chains, whereas cluster #3 only gathered depsides lacking such lengthy side-chains. At last, cluster #6 comprised tridepsides and a single didepside -lecanoric acid -which can be readily explained by its diaromatic core being encountered in all tridepsides implemented in our LDB collection. A similar side chain-length discrimination within apart clusters could be noted for depsidones (respectively clusters #4 and #5). A last example of rational structure-dependent clustering is that of xanthones that are being clustered differentially depending on their monomeric or dimeric status (Fig. 2d). The southern part of cluster #1 is an example of structurally inhomogeneous-clustering compounds (Fig. 2c), that somehow came to exhibit close-enough fragmentation processes to belong to a same cluster. Such "erratic" structures are represented by quinones (Fig. 2c,d, in red) which produce few fragments in MS/ MS and are found mostly as self-looped nodes or related to compounds having similar appendages. Each molecular network presented in this article is available on the GNPS platform: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=cc4925fa2ccd43b790b708576f47e7b5 for the negative mode (Fig. 2c) and https://gnps.ucsd.edu/ ProteoSAFe/status.jsp?task=c79d748b515b4357a515bcec1435e6a1 for the positive mode (Fig. 2d). Each cluster can be found with the same numbering as in the figure, as well as each node along with their spectra and identifications. To the best of our knowledge, this is the first sizeable MS/MS database of lichen compounds.

Data Records
MS/MS data of the LDB can be found on the GNPS (merged spectra) and on Metabolights (merged and individual collision energy spectra). It can be accessed through GNPS in the library webpage (https://gnps.ucsd. edu/ProteoSAFe/libraries.jsp), each spectrum having its own accession number, CCMSLIB00004751209 to CCMSLIB00004751434 for negative spectra, CCMSLIB00004751435 to CCMSLIB00004751517 for positive spectra. The spectral collection is also available for download from MetaboLights under the identifier: MTBLS999 59 . www.nature.com/scientificdata www.nature.com/scientificdata/ (Table S2). Acetone extracts of the lichens were analysed in negative polarity LC-MS and produced files were subjected to an MZmine-GNPS workflow (parameters in Tables S3 and S4). A molecular network was thus produced and dereplication was carried out by three different methods: (i) using only the LDB, (ii) using all GNPS spectral libraries including the LDB, (iii) using all GNPS spectral libraries excluding the LDB. The extent of advantages offered by the LDB was assessed by comparing the hits between the three dereplication methods. The molecular network generated is presented in Fig. 3 with colour-coded nodes depending on the aforementioned dereplication method: (i) green nodes were found exclusively in the LDB, (ii) yellow nodes were identified using all GNPS libraries including the LDB, (iii) red nodes could not be dereplicated. A total of 15 unique molecules were thus dereplicated, with 11 of them being exclusively identified with the LDB, and four being shared with other GNPS libraries. None of the hits were exclusive to other spectral libraries as all were already present in the LDB ( Table 1). As annotations generated by the gap-filling function of MZmine 68 (same m/z and RT) are associated with a lesser degree of confidence, the corresponding tags are ranked as 5 according to the widely accepted metabolomics confidence levels defined by Schymanski et al. 69 (Fig. 4). A tentative assignment of some unannotated nodes was performed from the molecular network with the support of the SIRIUS software 70 . In the context of lichenology, molecular networks are particularly useful to emphasize characteristic sets of products, also designated as chemosyndromes. These metabolites are structurally similar because of their biosynthetical interconnections and can therefore be expected to clusterize will form clusters in a molecular network. Such chemosyndromes are often sustained by a major metabolite accompanied by several minor satellite compounds 7 .  www.nature.com/scientificdata www.nature.com/scientificdata/ Dereplication of the Ophioparma ventosa extract. The chemistry of the crustose lichen Ophioparma ventosa was investigated by several research groups over the last decades and its specialized metabolome can be regarded as rather complex. This is related to the presence of some constant metabolites (thamnolic acid, decarboxythamnolic acid, usnic acid, divaricatic acid, haemoventosin and a wealth of further pyranonaphthoquinone pigments) 71,72 along with some additional compounds, like stenosporic acid, that may occur or not and which are -at least partly -inferred to originate from overgrown lichen species [72][73][74] . As the Ophioparma sample used was rather poor in apothecia, haemoventosin and their related pyronaphthoquinones were not detected. Some recent chemical reports dealing with this lichen species, including thorough phytochemical investigations by our group 28,72 , hinted that the chemistry of this species is quite sharply defined.

Metadata. Retention times (RTs
Automatic dereplication using the LDB allowed the annotation of four out of the five metabolites expected in negative mode: thamnolic, usnic, stenosporic and divaricatic acids (Table 1). Four other compounds were detected throughout gap-filling strategy (Fig. 4A). None of the latter were previously reported in the literature for Ophioparma spp.
These identified metabolites are clustered with some unidentified ones. Divaricatic and stenosporic acids are clustered with two nodes at m/z 373.1315 and 487.0743. While the latter can only be assumed to stand for an unknown depside, likely candidates to account for the former could be nordivaricatic acid (4-O-demethyldivaricatic acid) previously reported to occur in the frame of both divaricatic and stenosporic acids in several lichens, including O. ventosa (Fig. S2) 73 . Another noteworthy cluster for this lichen contains the ion at m/z 479.0596, seemingly an usnic acid dibenzofuran-related metabolite (Fig. S3). The joint occurrence of divaricatic acid and stenosporic acid is an illustration of a chemosyndrome, both depsides bearing side chains of moderate length, the latter being a minor satellite compound of the former 73 .
Dereplication of the Evernia prunastri extract. The fruticose lichen Evernia prunastri has been thoroughly investigated for its chemical content as it is widely used in fragrance industry. The typical odour of oak moss is related to the hydrolysis of odourless depsides to yield a suite of scentful monoaromatic compounds 75 . Expected compounds as historically described by Culberson 76 are atranorin, chloratranorin, evernic and usnic acid. Additionally, Joulain and Tabacchi published a critical review of the metabolites reported for Evernia prunastri which reached more than 170 structures 75 . As mentioned by the authors, some compounds have to be considered with utmost caution as the sources are sometimes a mixture of plant and lichen material in addition to environmental pollutants resulting in the detection of several petroleum products by GC. www.nature.com/scientificdata www.nature.com/scientificdata/ Automatic dereplication allowed the straightforward identification of all four classically described compounds (atranorin, chloratranorin, evernic and usnic acids) in addition to lecanoric acid and perlatolic acid, these two latter metabolites being newly reported in this deeply dug lichen model (Table 1). Physodic acid, previously www.nature.com/scientificdata www.nature.com/scientificdata/ reported in Evernia prunastri was detected by gap-filling 77 . Additional gap-filling-generated metabolites were alpha-alectoronic acid and physodalic acid (Fig. 4B).
Unlabeled nodes clustered with evernic acid include two ions at m/z 365.0443 and 347.0764, seemingly sharing a depside scaffold (Figs. S4 and S5). Evernia prunastri seems to contain the same usnic acid derivative as Ophioparma ventosa at m/z 479.0596.
Dereplication of the Hypogymnia physodes extract. This foliose species is known for producing an array of structurally diverse lichen metabolites comprising various orcinol depsides and depsidones differing in the length and hydroxylation status of their side chains 78 . Noteworthy, recent UHPLC-MS² based phytochemical investigations were performed on H. physodes 79 , so it can be inferred that a broad coverage of this lichen chemistry is available to evaluate the degree of information conveyed by our current dereplication pipeline. Expected metabolites are usnic acid, atranorin and chloroatranorin, along with a vast range of orcinol depsidones and their derivatives (viz. physodic, 3-hydroxyphysodic, 4-O-methylphysodic acid, 2′-O-methylphysodic acid, isophysodic acid, physodalic, 3-hydroxyphysodalic acid, conphysodalic acid, protocetraric acid and fumarprotocetraric acid), including α-alectoronic acid, a minor metabolite recently revealed to occur within this lichen 79 .
Outcome of the technical validation. The implementation of the LDB into the feature-based molecular networking workflow allowed the identification of several metabolites expected in these lichens. Some additional molecules were also reported for the first time from these deeply-dug models. These seemingly surprising results may be related to formerly unreported chemosyndromic variations 7 and/or to the higher sensitivity of this analytical strategy compared to formerly used techniques. Most hits were unique to the LDB, emphasizing the fact that most of these molecules were not reported before in MS/MS libraries of the GNPS. Metabolites absent from the LDB but detected in the studied lichens could be putatively identified, while completely unknown metabolites could be detected and linked to known scaffolds in lichens. These outcomes validate the adequacy of the LDB to dereplicate lichen metabolites. This network can be consulted on the GNPS platform at https://gnps.ucsd.edu/ ProteoSAFe/status.jsp?task=ee1285c8de3a45cda13d719271570dc7.

Code availability
Data in .mzXML format was filtered and merged with an R script available at https://github.com/dolivierj/ MSDB_Maker, using the R 3.6.0 language 63 with the MSnbase 64 package.