Background & Summary

In the 19th century, lichens were still considered to be lower plants but around the time the general concept of symbiosis was being developed, they were found to be associations between a fungus and an alga1,2. It was soon unveiled that this original lifestyle was accompanied by a specific chemodiversity. As knowledge regarding lichen chemistry progressed, tailored analytical approaches to harness this specific chemistry were developed. Accordingly, during the middle of the twentieth century Asahina and Shibata performed microcrystallizations to visually identify lichens metabolites, and started elucidating their structures3,4,5. TLC was later introduced in an attempt to dereplicate more precisely lichen metabolites and standardized migration solvent mixtures were designed in 1970 by Culberson & Kristinsson6, facilitating chemical profiling and identification of species by comparison to co-eluted standards. Although lichen chemicals were historically used as taxonomic markers, the delimitation of species based solely on chemistry was treated with caution. Major compounds and their biosynthetically related minor satellite compounds were however investigated as a means to classify lichens in chemosydromes to better understand the evolution of closely related taxa through their chemistry, although biomolecular data is now more commonplace7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23. Data accumulated during the 20th century was summarized in 1996 by Huneck and Yoshimura in their compendium “Identification of Lichen Substances”, comprising spectroscopic data for over 850 lichen chemicals, including TLC retardation factors, infra-red data, electron impact mass spectrometry signals, UV/visible spectra and NMR landmarks24. Even though a wide range of analytical techniques have been used to study lichen metabolites25,26,27,28,29, the favoured approach among lichenologists remains TLC analysis30,31,32,33,34. Because of its accessibility, it is still widely used today along with spot test reactions when describing new lichen species to report the main identified metabolites35,36,37,38. However, it lacks sensitivity, relies on co-elution with standards, and metabolite identification can be challenging as such approaches are based on comparative identification and not on the generation of proper spectroscopic data. Efforts have been made to update Huneck’s and Yoshimura’s compendium39,40,41,42 and databases geared towards chemotaxonomy of lichens have arisen, but these are based on a limited number of metabolites, mainly major compounds accessible by TLC detection43,44,45. Aside from their traditional use in chemotaxonomy, lichen metabolites, most notably polyketides, have been known for long to be specifically produced by these organisms and to have a variety of bioactive properties30,46,47,48,49,50,51,52. Although the last count of lichen metabolites is of about 105053, assessing the chemical diversity sheltered by this privileged biota is rendered especially challenging by the tendency of lichens to accumulate extremely elevated yields of a few major compounds. Moreover, a widely accepted idea is that these symbiotic systems are associated with a rather scarce chemical diversity sustained by a limited number of polyketide scaffolds. Yet, this may be related to a superficial knowledge of the specialized metabolome associated with these fascinating lifeforms. Studying lichens with modern analytical tools shall shed new light on lichen chemistry. Some recent reports demonstrated that some atypical structures are still to be described from lichen sources, as could be seen through the unprecedented skeletons of tsavoenones and sanctis, both obtained last year from Vietnamese Parmotrema species54,55. To this effect, a molecular networking dereplication pipeline was developed based on the implementation of an in-house MS/MS database (Fig. 1), inspired by the MIADB project spearheaded by Beniddir and co-workers56. Pure lichen metabolites from Siegfried Huneck’s chemical library maintained in Berlin, in addition to an in-house chemical library from the laboratory in Rennes, were used to constitute the MS/MS database labelled LDB: Lichen DataBase. It includes MS/MS spectra for 250 small molecules ionized in ESI, ESI+ and APCI, representing 23% of the alleged 1050 known lichen metabolites. The most common appendages in lichen metabolites are represented, i.e. depsides, depsidones, dibenzofurans, diphenylethers, pulvinic acid derivatives, quinones, xanthones and terpenes (Fig. 1). This database sets a new standard of dereplication in lichen chemistry, regarding both detection sensitivity and structure identification reliability. This new approach also bypasses the mandatory need for holding libraries of standard references since dereplication is based on the online webserver hosted by the GNPS. This data descriptor announces the deposition in public repositories of the LDB on the GNPS57 and Metabolights58,59 servers.

Fig. 1
figure 1

LDB workflow sustained by 250 lichen metabolites as of 07/2019.

Methods

Sample sources

Samples were prepared from two sources: the Huneck chemical library at the Botanical Garden and Botanical Museum in Berlin (B), and the chemical library from the laboratory in Rennes. The Huneck library contains 1520 numbered and catalogized substances assembled by Siegfried Huneck and collaborators (including Benno Feige), together with additional extracts from his research. A complete list is available from B, compiled by Heidi Kümmerling, Stefanie Schöne, and Harrie Sipman. The chemical library from Rennes contains as of today 144 lichen substances catalogized by the staff of the laboratory.

Sample preparation

Each of the 250 compounds was solubilized in HPLC-grade methanol at 1 mg/mL and placed into 1.5 mL HPLC vials prior to analysis. Solvents were purchased from Sigma-Aldrich.

Data acquisition

Samples were analysed using an Agilent 6530 Accurate-Mass Q-TOF hyphenated with a 1260 Agilent Infinity LC system. The column used was a Waters SunFire C18 (50 × 4.6 mm, i.d. 3.5 µm) with a flow rate of 0.5 mL/min. Elution solvents used were Milli-Q water +0.1% FA (A) and HPLC-grade acetonitrile +0.1% FA (B) and elution gradient was the following: 0 min at 5% B, 7 min at 100% B, 8 min at 100% B, 9 min at 5% B. Most analytes were ionized by electrospray in negative polarity, xanthones and quinones were additionally ionized in positive mode, and terpenes were ionized exclusively with an APCI source. ESI conditions were set with the capillary temperature at 320 °C, source voltage at 3.5 kV, and a sheath gas flow rate of 10 L/min. Regarding APCI analyses, the corona current was set to 4 µA, the nebulizer pressure was 35 psig and 10 L/min nitrogen flow heated at 350 °C was used for desolvation. Capillary, fragmentor and skimmer voltages were set to 3500 V, 175 V and 65 V respectively. There were four scan events: negative or positive MS, window from m/z 100–1200, then three data-dependent MS/MS scans of the first, second, and third most intense ions from the first scan event. MS/MS settings were the following: three collision energies for the negative mode (10, 25, 40 eV, and additionally 2.5, 5, and 7.5 eV for depsides), three for the ESI+ and APCI modes (5, 20 and 35 eV), default charge of 1, isolation width of m/z 2. Purine (C5H4N4, m/z 121.050873 (positive)), trifluoroacetic acid (CF3CO2H, m/z 112.98559, negative) and HP-0921 (hexakis(1H, 1H, 3H-tetrafluoropropoxy)-phosphazene C18H18F24N3O6P3, m/z 922.009798 (positive), 1033.988109 (negative, trifluoroacetate adduct) were used as internal lock masses. Full scans were acquired at a resolution of 10 000 (m/z 922) and 4000 (m/z 121) (positive polarity) and 10 000 (m/z 1033) and 4 800 (m/z 112).

Database constitution

309 files in the d agilent format were thus generated and converted in an mzXML format using the MSConvert60 module from Proteowizard61,62. Raw converted data were then treated using a custom script in the R 3.6.0 language63 with the MSnBase package64 to isolate MS/MS spectra at each collision energy for the given metabolite and save them as a one scan mzXML file. Additionally, a merged MS/MS spectrum was generated as a one scan mzXML file by averaging previously generated spectra for a given compound. Merged mzXML files were assembled in a single mgf file using a private online workflow provided by the GNPS platform and uploaded with a metadata file as an open access MS/MS database (available at https://gnps.ucsd.edu/ProteoSAFe/gnpslibrary.jsp?library=LDB_POSITIVE for the positive mode spectra, and https://gnps.ucsd.edu/ProteoSAFe/gnpslibrary.jsp?library=LDB_NEGATIVE for the negative spectra). Each spectrum was manually curated to assess the quality of the fragmentation and the identity of fragmented metabolites.

Database description and use in molecular network-based dereplication

The LDB contains 309 merged MS/MS spectra from 250 lichen metabolites ionized in Negative Ion mode ESI-MS (NI-ESI) (226 spectra), Positive Ion mode ESI-MS (PI-ESI) (68 spectra) and APCI (15 spectra). Additionally, the 1011 MS/MS spectra at individual collision energies are available on MetaboLights. The available standards cover a huge majority of the scaffolds reported in the 1996 Huneck and Yoshimura compendium (Fig. 2a,b) and can therefore be expected to provide a valuable information for lichen chemical profiling given the high degree of structural recurrence within these organisms. A further criterion for database convenience assessment deals with the ability to provide general data acquisition that encompass as wide as possible a diversity of the metabolites it comprises. The occurrence of acidic functions within a vast majority of lichen metabolites was incentive for their analyses in negative-ion mode, consistently with former reports25. While the main analysis parameters (viz. 10/25/40 eV collisions energies) afforded convenient tandem mass spectra for most structures investigated herein, two specific cases required further acquisition settings to be applied. At first, depsides underwent an extensive fragmentation under the aforementioned conditions. To remedy, further analyses using lower collision energies were carried out (i.e. 2.5/5/7.5 eV) on this specific structural class, which afforded overall better results than the former parameters. All six spectra related to depsides were implemented into the LDB. Besides, one should keep in mind that electrospray ionization mainly facilitates the formation of deprotonated molecules but poorly leads to radical ions65. Therefore, in specific molecular environments where phenolic groups can instigate intramolecular hydrogen bonds, their deprotonation is rendered impossible and the molecule is not amenable to NI-ESI-MS detection66 without dedicated analytical procedures. As to lichen diversity, this configuration is mostly encountered in γ-pyrone-containing metabolites, i.e. quinones, xanthones and chromones, some of which could not be detected upon NI-ESI-MS analyses. However, irrespective of their detected/not-detected status in negative polarities, all of them provided satisfactory MS² spectra in PI-ESI-MS, which should be favoured when analyzing lichen species producing these structural series. At last, terpenes and steroids were analyzed using an APCI source which provides better results than ESI for low to medium-polarity compounds67, although Atmospheric Pressure Photo Ionization would probably shift more again towards non-polar compounds. The database is available on GNPS for molecular networking applications. Fig. 2c,d show two molecular networks generated with the spectra from the LDB as input and each node (or molecule) was colored according to the chemical scaffolds they represent, as described by Huneck and Yoshimura24 (parameters in Table S1). Generally speaking, the negative-ion mode generated molecular network is bigger due to the much higher number of compounds from the LDB analyzed in this mode, which should prevail in lichenochemistry. The molecular network obtained from the positive-ion mode revealed a more limited clustering, which may be related to the subset of compounds analyzed in this ionization mode also being among the most scattered in negative polarity analyses. Overall, the obtained molecular networks tended to cluster according to compounds’ structural classes. However, each scaffold did not produce a unique and homogenous cluster in the retained settings and, conversely, seemingly-unrelated metabolites sometimes came to cluster together as a consequence of their similar functional groups, thereby being prone to undergoing similar neutral losses. Structural features accounting for the topology of molecular networks are not usually worth being too thoroughly investigated since it exclusively depends on cosine threshold definition, the stringency of which considerably affects the obtained outcome. Nevertheless, some clustering behaviours could be rationalized in a straightforward manner. Illustrative chemical representatives of the discussed clusters are displayed in Fig. 2. As such, an interesting example is that of depsides (Fig. 2c, in cyan) which were split into three major clusters. The cluster #2 contained depsides with three to seven-membered alkyl side-chains, whereas cluster #3 only gathered depsides lacking such lengthy side-chains. At last, cluster #6 comprised tridepsides and a single didepside - lecanoric acid – which can be readily explained by its diaromatic core being encountered in all tridepsides implemented in our LDB collection. A similar side chain-length discrimination within apart clusters could be noted for depsidones (respectively clusters #4 and #5). A last example of rational structure-dependent clustering is that of xanthones that are being clustered differentially depending on their monomeric or dimeric status (Fig. 2d). The southern part of cluster #1 is an example of structurally inhomogeneous-clustering compounds (Fig. 2c), that somehow came to exhibit close-enough fragmentation processes to belong to a same cluster. Such “erratic” structures are represented by quinones (Fig. 2c,d, in red) which produce few fragments in MS/MS and are found mostly as self-looped nodes or related to compounds having similar appendages. Each molecular network presented in this article is available on the GNPS platform: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=cc4925fa2ccd43b790b708576f47e7b5 for the negative mode (Fig. 2c) and https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=c79d748b515b4357a515bcec1435e6a1 for the positive mode (Fig. 2d). Each cluster can be found with the same numbering as in the figure, as well as each node along with their spectra and identifications. To the best of our knowledge, this is the first sizeable MS/MS database of lichen compounds.

Fig. 2
figure 2

LDB metrics and characteristics. Metabolites are classified according to Huneck and Yoshimura24 and the Misc. class represents acids, aliphatic and cycloaliphatic compounds, diphenylethers, mycosporines, naphthopyranes, N-containing compounds, polyols, monosaccharides and carbohydrates. Panel a: Number of representatives for each metabolite class and their relative proportions. Panel b: Proportion of metabolites covered for each class. Panels c and d: molecular networks using respectively the negative and positive mode spectra from the lichen metabolites of the LDB as input with a cosine similarity score cut-off of 0.6. Main clusters are numbered and a representative structure of the cluster is displayed.

Data Records

MS/MS data of the LDB can be found on the GNPS (merged spectra) and on Metabolights (merged and individual collision energy spectra). It can be accessed through GNPS in the library webpage (https://gnps.ucsd.edu/ProteoSAFe/libraries.jsp), each spectrum having its own accession number, CCMSLIB00004751209 to CCMSLIB00004751434 for negative spectra, CCMSLIB00004751435 to CCMSLIB00004751517 for positive spectra. The spectral collection is also available for download from MetaboLights under the identifier: MTBLS99959.

Metadata

Retention times (RTs) and structures for each metabolite are reported in the Supporting Information. Additional data including, MS acquisition parameters, instrument details, organism, organism part, SMILES and InChI codes, CAS numbers, CHEBI IDs, and chemical formula are available on MetaboLights.

Technical Validation

Spectroscopic validation of LDB compounds

Structures of these metabolites were retrieved from the literature and their identity was confirmed by manual curation of the MS/MS spectra by inspecting the parent mass and fragment ions.

Molecular network-based dereplication of selected lichen acetone extracts

Technical validation was achieved by using the LDB to dereplicate extracts of well documented lichens sampled from the herbarium of Rennes: Ophioparma ventosa (JB/14/211), Evernia prunastri (JB/13/156) and Hypogymnia physodes (JB18/234) (Table S2). Acetone extracts of the lichens were analysed in negative polarity LC-MS and produced files were subjected to an MZmine-GNPS workflow (parameters in Tables S3 and S4). A molecular network was thus produced and dereplication was carried out by three different methods: (i) using only the LDB, (ii) using all GNPS spectral libraries including the LDB, (iii) using all GNPS spectral libraries excluding the LDB. The extent of advantages offered by the LDB was assessed by comparing the hits between the three dereplication methods. The molecular network generated is presented in Fig. 3 with colour-coded nodes depending on the aforementioned dereplication method: (i) green nodes were found exclusively in the LDB, (ii) yellow nodes were identified using all GNPS libraries including the LDB, (iii) red nodes could not be dereplicated. A total of 15 unique molecules were thus dereplicated, with 11 of them being exclusively identified with the LDB, and four being shared with other GNPS libraries. None of the hits were exclusive to other spectral libraries as all were already present in the LDB (Table 1). As annotations generated by the gap-filling function of MZmine68 (same m/z and RT) are associated with a lesser degree of confidence, the corresponding tags are ranked as 5 according to the widely accepted metabolomics confidence levels defined by Schymanski et al.69 (Fig. 4). A tentative assignment of some unannotated nodes was performed from the molecular network with the support of the SIRIUS software70. In the context of lichenology, molecular networks are particularly useful to emphasize characteristic sets of products, also designated as chemosyndromes. These metabolites are structurally similar because of their biosynthetical interconnections and can therefore be expected to clusterize will form clusters in a molecular network. Such chemosyndromes are often sustained by a major metabolite accompanied by several minor satellite compounds7.

Fig. 3
figure 3

Molecular network generated from the acetone extracts of Ophioparma ventosa, Evernia prunastri and Hypogymnia physodes following the feature-based molecular networking workflow. Nodes with green outer circles represent ions dereplicated solely with the LDB, yellow outer-circles ions are shared by the LDB with other GNPS libraries. Red outer-circles represent nodes that could not be dereplicated automatically. Pie charts inside the nodes represent the proportion with which the ions were observed in each of the three lichens. Identified ions are represented with their structure, name, chemical formulas and theoretical m/z values. Edges between nodes are labelled with their difference in mass.

Table 1 Metabolites annotated by the GNPS libraries.
Fig. 4
figure 4

Summary of the molecular networking-based dereplication process of the extracts of Ophioparma ventosa, Evernia prunastri and Hypogymnia physodes. Metabolites dereplicated by MS/MS and reported in the literature for the studied lichen are marked by a green rectangle. Metabolites that were either identified by MS/MS but not reported in the literature, or reported in the literature but only identified by gap-filling are marked by yellow rectangles. Metabolites not reported in the literature and only identified by gap-filling are identified as grey rectangles. Metabolites reported in the literature but not identified in the analysis are shown as red rectangles. All hits are accompanied by a confidence score according to Schymanski et al.69, from 5 (same exact mass) to 1 (same MS/MS spectrum and RT as the reference standard).

Dereplication of the Ophioparma ventosa extract

The chemistry of the crustose lichen Ophioparma ventosa was investigated by several research groups over the last decades and its specialized metabolome can be regarded as rather complex. This is related to the presence of some constant metabolites (thamnolic acid, decarboxythamnolic acid, usnic acid, divaricatic acid, haemoventosin and a wealth of further pyranonaphthoquinone pigments)71,72 along with some additional compounds, like stenosporic acid, that may occur or not and which are – at least partly – inferred to originate from overgrown lichen species72,73,74. As the Ophioparma sample used was rather poor in apothecia, haemoventosin and their related pyronaphthoquinones were not detected. Some recent chemical reports dealing with this lichen species, including thorough phytochemical investigations by our group28,72, hinted that the chemistry of this species is quite sharply defined.

Automatic dereplication using the LDB allowed the annotation of four out of the five metabolites expected in negative mode: thamnolic, usnic, stenosporic and divaricatic acids (Table 1). Four other compounds were detected throughout gap-filling strategy (Fig. 4A). None of the latter were previously reported in the literature for Ophioparma spp.

These identified metabolites are clustered with some unidentified ones. Divaricatic and stenosporic acids are clustered with two nodes at m/z 373.1315 and 487.0743. While the latter can only be assumed to stand for an unknown depside, likely candidates to account for the former could be nordivaricatic acid (4-O-demethyldivaricatic acid) previously reported to occur in the frame of both divaricatic and stenosporic acids in several lichens, including O. ventosa (Fig. S2)73. Another noteworthy cluster for this lichen contains the ion at m/z 479.0596, seemingly an usnic acid dibenzofuran-related metabolite (Fig. S3). The joint occurrence of divaricatic acid and stenosporic acid is an illustration of a chemosyndrome, both depsides bearing side chains of moderate length, the latter being a minor satellite compound of the former73.

Dereplication of the Evernia prunastri extract

The fruticose lichen Evernia prunastri has been thoroughly investigated for its chemical content as it is widely used in fragrance industry. The typical odour of oak moss is related to the hydrolysis of odourless depsides to yield a suite of scentful monoaromatic compounds75. Expected compounds as historically described by Culberson76 are atranorin, chloratranorin, evernic and usnic acid. Additionally, Joulain and Tabacchi published a critical review of the metabolites reported for Evernia prunastri which reached more than 170 structures75. As mentioned by the authors, some compounds have to be considered with utmost caution as the sources are sometimes a mixture of plant and lichen material in addition to environmental pollutants resulting in the detection of several petroleum products by GC.

Automatic dereplication allowed the straightforward identification of all four classically described compounds (atranorin, chloratranorin, evernic and usnic acids) in addition to lecanoric acid and perlatolic acid, these two latter metabolites being newly reported in this deeply dug lichen model (Table 1). Physodic acid, previously reported in Evernia prunastri was detected by gap-filling77. Additional gap-filling-generated metabolites were alpha-alectoronic acid and physodalic acid (Fig. 4B).

Unlabeled nodes clustered with evernic acid include two ions at m/z 365.0443 and 347.0764, seemingly sharing a depside scaffold (Figs. S4 and S5). Evernia prunastri seems to contain the same usnic acid derivative as Ophioparma ventosa at m/z 479.0596.

Dereplication of the Hypogymnia physodes extract

This foliose species is known for producing an array of structurally diverse lichen metabolites comprising various orcinol depsides and depsidones differing in the length and hydroxylation status of their side chains78. Noteworthy, recent UHPLC-MS² based phytochemical investigations were performed on H. physodes79, so it can be inferred that a broad coverage of this lichen chemistry is available to evaluate the degree of information conveyed by our current dereplication pipeline. Expected metabolites are usnic acid, atranorin and chloroatranorin, along with a vast range of orcinol depsidones and their derivatives (viz. physodic, 3-hydroxyphysodic, 4-O-methylphysodic acid, 2′-O-methylphysodic acid, isophysodic acid, physodalic, 3-hydroxyphysodalic acid, conphysodalic acid, protocetraric acid and fumarprotocetraric acid), including α-alectoronic acid, a minor metabolite recently revealed to occur within this lichen79.

All these metabolites were dereplicated except fumarprotocetraric acid and metabolites absent from the LDB, i.e., conphysodalic acid, 4-O-methylphysodic acid, 2′-O-methylphysodic acid, 3-hydroxyphysodalic acid and isophysodic acid (Table 1). Additionally, salazinic acid, previously unreported in H. physodes, was dereplicated. Conphysodalic acid can be linked to a node at m/z 417.0833 in the physodalic acid cluster (Figs. 3, S6), as well as 3-hydroxyphysodalic acid as a self-looped node at m/z 431.0617. Other level 5 annotations include evernic acid and perlatolic acid. Fumarprotocetraric acid, 4-O-methylphysodic acid, 2′-O-methylphysodic acid and isophysodic acid could not be detected even as trace (Fig. 4C).

Outcome of the technical validation

The implementation of the LDB into the feature-based molecular networking workflow allowed the identification of several metabolites expected in these lichens. Some additional molecules were also reported for the first time from these deeply-dug models. These seemingly surprising results may be related to formerly unreported chemosyndromic variations7 and/or to the higher sensitivity of this analytical strategy compared to formerly used techniques. Most hits were unique to the LDB, emphasizing the fact that most of these molecules were not reported before in MS/MS libraries of the GNPS. Metabolites absent from the LDB but detected in the studied lichens could be putatively identified, while completely unknown metabolites could be detected and linked to known scaffolds in lichens. These outcomes validate the adequacy of the LDB to dereplicate lichen metabolites. This network can be consulted on the GNPS platform at https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=ee1285c8de3a45cda13d719271570dc7.