Identification and dereplication of endophytic Colletotrichum strains by MALDI TOF mass spectrometry and molecular networking

The chemical diversity of biologically active fungal strains from 42 Colletotrichum, isolated from leaves of the tropical palm species Astrocaryum sciophilum collected in pristine forests of French Guiana, was investigated. The collection was first classified based on protein fingerprints acquired by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) correlated with cytotoxicity. Liquid chromatography coupled to high-resolution tandem mass spectrometry (LC-HRMS/MS) data from ethyl acetate extracts were acquired and processed to generate a massive molecular network (MN) using the MetGem software. From five Colletotrichum strains producing cytotoxic specialized metabolites, we predicted the occurrence of peptide and cytochalasin analogues in four of them by MN, including a similar ion clusters in the MN algorithm provided by MetGem software. Chemoinformatics predictions were fully confirmed after isolation of three pentacyclopeptides (cyclo(Phe-Leu-Leu-Leu-Val), cyclo(Phe-Leu-Leu-Leu-Leu) and cyclo(Phe-Leu-Leu-Leu-Ile)) and two cytochalasins (cytochalasin C and cytochalasin D) exhibiting cytotoxicity at the micromolar concentration. Finally, the chemical study of the last active cytotoxic strain BSNB-0583 led to the isolation of four colletamides bearing an identical decadienamide chain.

In the field of bioactive natural products research, the chemical diversity of microorganisms is increasingly explored 1 . Microorganisms virtually occupy every living habitat on earth, and endophytes have long been thought to be a prime source of original chemical compounds 2 . These symbiotic microorganisms are ubiquitous in plants, living inside the tissues of a host without causing any apparent harm 3 . Tropical tree leaves are considered as hotspot for endophytes 4 . Because of their high diversity, it has been clearly established that they can serve as reservoirs of new bioactive specialized metabolites 5 . Indeed, endophytic strains have been shown to contribute to plant defences by preventing herbivory 6 and invasion from superficial pathogens 3 . Therefore, a significant proportion of endophytic extracts are thought to be cytotoxic and/or antimicrobial.
Among fungal endophytes, the genus Colletotrichum is represented by a large number of species widely distributed in the tropics but also in temperate regions 7 . Some Colletotrichum are predominant in living plant tissues as symptomless endophytes 8,9 . The presence of Colletotrichum tofieldiae in roots has been shown to enhance plant Scientific Reports | (2020) 10:19788 | https://doi.org/10.1038/s41598-020-74852-w www.nature.com/scientificreports/ fitness 10 , but other species are pathogenic. Indeed, Colletotrichum spp. are also among the most important groups of plant pathogens 11 . Several studies have sought to advance the systematic classification of Colletotrichum and to propose better identification of the species using DNA sequencing methods 12,13 . Among them, Douanla-Meli and Unger used high-throughput sequencing on selected markers for 454 Colletotrichum spp. isolates to identify strains at the species level 14 . Nevertheless, there is a growing interest in developing a fast and efficient method to identify and classify strains 15 to reduce the required cost and time. Over the last decade, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has been implemented in hospital centres as the method of choice for microbiology diagnosis. This method compares microbial peptide/protein fingerprint against reference databases 16 . To improve classification when using large microorganism collections, machine learning technique, such as the Aristotle Classifier, has been introduced in the literature 17 . Moreover, the identification accuracy is increased by combining the data from MALDI-TOF MS and FTIR spectroscopy and hierarchical clustering analysis 18 . Alternatives to MALDI profiling has been recently introduced by the group of Takats demonstrating that automated laser-assisted rapid evaporative ionisation mass spectrometry (LAREIMS) platform can chemically screen over 450 yeast colonies in under 4 h, while simultaneously generating recoverable glycerol stocks of each colony in real-time 19 . First dedicated to the identification of human pathogenic or opportunistic yeasts and bacteria, some studies recently developed spectral databases for a more systematic identification of filamentous fungi 20,21 . Nevertheless, these databases still focus on clinical isolates. Few studies reported MALDI-TOF MS as a tool for strain identification for environmental microorganisms 22 and demonstrated the correlation of MALDI classification of microorganism together with production of specialized metabolites 23 . Forty-two endophytic and cultivable Colletotrichum strains from healthy leaves of a tropical palm species, Astrocaryum sciophilum Miq. (Pulle), Arecaceae were studied. A. sciophilum is a long-lived palm that grows in the understory of tropical forests of the Guiana Shield of Amazonia, and can be locally abundant 24 . A methodology was developed to acquire protein MS profile of Colletotrichum strains by MALDI-TOF MS to decipher the variability of our collection. Colletotrichum strains with similar phenotypes were recognized thereby reducing the number of strains of interest. In parallel, strains were extracted with ethyl acetate (EtOAc) and each extract was tested on cell viability bioassays for cytotoxic activity. Five unique cytotoxic Colletotrichum extracts were further investigated by the molecular network approach [25][26][27] and the new t-SNE (t-distributed stochastic neighbor embedding) algorithm provided by MetGem software 28 . This study led to the isolation of known metabolites (cyclopeptides and cytochalasins) and of an unprecedented series of new metabolites designated as colletamides.

Results and discussion
Forty-two Colletotrichum extracts were produced from endophytes isolated on five Astrocaryum sciophilum palm specimens. We then quantified the cell viability of the EtOAc extract on MRC-5 cells lines (Table S1). Among the 42 extracts, twelve induced less than 35% of cell viability at 10 µg/mL and were designated as cytotoxic, while cell viabilities were over 65% at 10 µg/mL for the remaining ones. All strains were sequenced using a universal DNA marker of the ITS ribosomal cluster. Most strains were related to Colletotrichum gloeosporioides based on the closest ITS sequences according to the BLAST tool on NCBI (Table S1). Second or third closest sequences mostly refer to other Colletotrichum species, such as C. fructicola, C. karstii or C. vietnamense with no clear strain clustering. A phylogenetic tree was built using the ITS sequences of each isolated strain based on a neighbourjoining analysis of an alignment of ITS sequences with 1000 bootstrap replicates ( Figure S1). First, it confirmed that an accurate identification at the species level remained impossible based on the ITS region sequencing only due to highly similar sequences on this particular genomic region.
Second, there is no obvious correlation between the ITS-based phylogeny and the biological activity profile. Moreover, strains in the same clade were not necessarily morphologically similar ( Figure S2). Given this level of variability, and in order to search for bioactive metabolites in a large culture collection, we decided to generate a classification that is more reflective of strain secondary metabolism rather than taxonomical classification.
For each isolate, a protein extract was analyzed by MALDI-TOF MS. Fingerprints showed that some of our Colletotrichum isolates share the same protein profiles whereas fifteen strains are characterized by a unique one ( Figure S1 and Figure S3-S22). Considering previous reports on strain identification by MALDI-TOF MS and the specificity of protein fingerprints among species 21,22 , we assumed that each protein fingerprint refers to a Operational Chemically Unit (in reference and comparison with Operational Taxonomic Unit in genetic or OTU) within the genus. This expectation has been confirmed through the classification by MALDI-TOF protein profiling and hierarchical clustering data analyses of the complete collection of environmental microorganisms at CNRS-ICSN, containing more than 1000 unique strains. Thus, from the 20 different protein fingerprints drawn from the 42 Colletotrichum isolates ( Figure S1), our collection can be realigned into 18 different chemotypes OCU. These OCUs (noted from Colletotrichum sp.1 to Colletotrichum sp.18) are classified following a hierarchical clustering based on their protein fingerprint (Fig. 1). Inactive BSNB-0703, BSNB-0290 and BSNB-0646 isolates do not appear in this hierarchical clustering because of the low quality of the recorded spectra.
The chemical diversity of extracts was analyzed to prioritize the isolation of active metabolites. To this end, a massive molecular network (MN) was generated from the liquid chromatography coupled to high-resolution tandem mass spectrometry (LC-HRMS/MS) data acquired from the EtOAc extracts of our 42 Colletotrichum isolates ( Fig. 2 and S23-S24). To facilitate the visualization of bioactive ion clusters, each cytotoxic extract was associated to a colour on the MN (Fig. 2). For each node (couple of a particular MS/MS spectrum and retention time), the repartition of compounds across the different extracts is depicted as a pie chart diagram based on these colours.
Cytotoxic extracts of three OCU: Colletotrichum sp.2, sp.14 and sp.11 displayed noticeably specific ion clusters ( Fig. 2A,C). All extracts of Colletotrichum sp.1 and sp.12 shared similar ion clusters in the MN, respectively (Fig. 2B). Secondly, although BSNB-0627 was the only bioactive strain from OCU Colletotrichum sp.6 Scientific Reports | (2020) 10:19788 | https://doi.org/10.1038/s41598-020-74852-w www.nature.com/scientificreports/ sub-collection, its chemical profile did not differ from isolates with the same protein fingerprint such as BSNB-0625 and BSNB-0628 (Fig. 2D). The MN did not explain the different biological activities among the OCU Colletotrichum sp.6 strains. BSNB-0627 was then excluded from further chemical investigation. As OCU Colletotrichum sp.1 and sp.12 had the same metabolite composition, we finally focused on four different metabolite profiles coming from the five cytotoxic taxa. The study of OCU Colletotrichum sp.14 and sp.11 extracts was undertaken by searching known compounds and analogues in the MS/MS databases available from the MetGem software 26,29 . Thanks to this methodology, cyclopeptide analogues could be annotated ( Fig. 2C and 3). After a scaled-up culture of BSNB-0580, three cyclopeptides were isolated from the EtOAc extract and identified by NMR, MS/MS analysis and manual fragmentation assignments. The amino acid composition was deduced from the immonium ions in the low mass range of the MS/MS spectra 30 : ions at m/z 86.097 (Leucine/Isoleucine), m/z 72.081 (Valine) and m/z 120.081 (phenylalanine) were unambiguously attributed. Structures of isolated cyclopeptides cyclo(Phe-Leu-Leu-Leu-Val) (1), cyclo(Phe-Leu-Leu-Leu-Leu) (2) and cyclo(Phe-Leu-Leu-Leu-Ile) (3) were then refined by NMR (Fig. 4).
The relative configuration of cyclo(Phe-Leu-Leu-Leu-Ile) was determined by X-ray crystallography ( Figure  S81). By analysing the clusters of nodes related to these cyclopeptides (Fig. 4), we observed that two isomers of compounds 2 and 3 can be produced by the strain BSNB-0580 whereas three isomers of compound 1 were also produced, certainly with differences in the amino acid sequence or in the proportion leucine/isoleucine. Furthermore, two nodes at m/z 600.38 and 586.36 matched with peptides bearing oxidized methionine due to the detection of an immonium ion signal at m/z 120.048. Thus, we can assume that the two compounds are composed of one oxidized methionine and four leucines (or isoleucines) for the first one, and one oxidized methionine, one valine and three leucines (or isoleucines) for the second one. Moreover, other peptides composed of various number of phenylalanines, leucines/isoleucines and valines were annotated based on the HRMS and the MS/MS fragmentation. Finally, nodes belonging to peptides bearing non-proteinogenic amino acids were annotated based on unusual m/z values of immonium ions and differences between b and y-ion products. These cyclopeptides have not been isolated and formally identified because of their low abundance in the strain extract.
The MN approach was unable to annotate metabolites clusters specifically related to OCU Colletotrichum sp.12 and sp.1 (Fig. 2B). Indeed, the MN approach ranks the clusters by size, from the largest to the smallest (selfloop), but does not give access to the inter-cluster distance. To overcome this limitation, data analysis using the . The spectroscopic data for these compounds were identical to those provided in the literature 45,50 . These results demonstrate the interest of the t-SNE prediction module for propagating nodes annotation. Finally, according to the classical and t-SNE MN, the extract of OCU Colletotrichum sp.2 (BSNB-0583) contained specialized metabolites specific to this strain distributed into two related clusters. Targeted isolation of the strain-specific metabolites provided 4 compounds (7-10) in pure form (Fig. 4). Compound 7 molecular formula was determined as C 19  . According to NRM data analysis (see supplementary information), compounds 7 was therefore analogous to aranorosin 31 . It has been referenced in a patent before as a potential inhibitor of the 3-hydroxy-3-methyl-glutaryl-coenzyme A reductase 32 . However, the source organism was claimed to be an unidentified fungus (ATCC 20,953), the stereochemistry was not specified and the only analytical data provided was a list of 13 C NMR chemical shifts. This list suggests that their compound C and our compound 7 may have been identical. We determined the relative configuration in compound 7 combining NOE NMR data DP4 approach 33 (see supplementary information). A very high probability (99.5%) was obtained for the relative configuration 2′S*,4′S*,5′S*,8′S*-7 ( Figure S48). The 2′S,4′S,5′S,8′S-7 absolute configuration of 7, was deduced by comparison of the predicted and experimental ECD spectra ( Figure S54). Compound 7 was named colletamide A.
Using the same structural elucidation process, compounds 8  The absolute configuration of 9 and 10 were assumed to be 2′R, 4′S, 5′S and 3′ S, 4′S, respectively, by comparison with 7 and 8 and taking into account that all the isolated compounds were most probably biosynthesized from a L-phenylalanine precursor.
Finally, all isolated compounds were tested for their potential cytotoxicity on MRC-5 cells lines. The cytotoxic activities of cyclopeptides 1 to 3 were confirmed, with IC 50 between 17.7 and 1.40 µM (Table S6). Cytotoxic activity of cytochalasin C and D was 3.01 and 1.97 µM, respectively.
Colletamide A-D did not exhibit any cytotoxicity (IC 50 > 100 µM). This novel series of decadienamides was therefore not responsible for the activity of OCU Colletotrichum sp.2 extract, in agreement with the loss of activity of the second EtOAc extract of BSNB-0583 (OCU Colletotrichum sp.2). Indeed, second extract (BSNB-0583) exhibited 63 ± 4% of viability on MRC-5 cells (10 µg/mL), therefore showing a reduced cytotoxicity compared to the initial extract (32 ± 1%). Variation in metabolite production is a common phenomenon already described in the literature 1 . Comparison of the LC-MS profiles between the cytotoxic and the non-cytotoxic extract shows a difference in intensity for the ion at m/z 346.2738. This ion belongs to a cluster of nodes specific to BSNB-0583 strain in the MN and this could be responsible for the altered biological activity of the extract. Curiously, the metabolite at m/z 346.2738 has not been recovered from any fractions after the purification process suggesting that it could be sensitive to purification conditions. Nevertheless, the chemical formula C 22 (1)  www.nature.com/scientificreports/ strains of this genus [35][36][37][38][39] . Colutellin A, an antimycotic peptide has previously been isolated from Colletotrichum dematium 40 . The presence of such peptides can explain the cytotoxic activity of the strain BSNB-0580. Indeed, cyclopeptides isolated from microorganisms and showing toxic activities have been extensively described in the literature 41 .
The cytotoxic activity of cytochalasins and their mechanisms of action have been intensely studied. They are known to inhibit the polymerization of actin and disrupt cellular morphology and cell division 42 . More specifically, cytochalasin D inhibits protein synthesis in HeLA cells 43 , and cytochalasin E is a potent antiangiogenic agent due to the presence of an epoxide 44 . As far as we know, it is the first time that cytochalasins have been isolated from Colletotrichum species. Previous studies report their presence in the genera Xylaria 45,46 , Diaporthe/Phomopsis 47 , Aspergillus 48 , Phoma 49 . At last, hirsutatin A has been first isolated from an entomopathogen fungus Hirsutella nivea and showed no cytotoxic activity against Vero cell lines 50 .
Compound 7 had been already isolated from the fungal strains Pseudoarachniotus roseus, including a patent for antihypercholesterol and antimicrobial activities 51 , and from Chaetomium cupreum 52 . The arenosin, an another analogues has also previously been isolated P. roseus. The chaetocuprum and the 10,11-epoxychaetocuprum have been obtained from the endophyte fungus C. cupreum 53 . Finally, other lipoamino acids with fatty acids derivatives of phenylalanine (similar to compound 9) have already been isolated from bacteria of the Pantoea genus 54 . All of these compounds are known to have antibacterial activity but no cytotoxicity has been reported in the literature.

Conclusion
The newly generated collection of 42 endophytic Colletotrichum strains has been chemically investigated by upto-date analytical tools, i.e. MALDI-TOF profiling and molecular networking dereplication, to identify secondary cytotoxic metabolites. We found that Colletotrichum strains produce specific specialized metabolites depending on the Operational Chemically Unit OCU highlighted by the MALDI-TOF MS clustering. This has finally led to the isolation and complete de novo characterisation of 10 Colletotrichum specialised metabolites.
Cytochalasins and peptides have been isolated in three cytotoxic extracts (BSNB-0615, -0652 and -0580), but chemical diversity remains unknown for the 8 remaining cytotoxic extracts. At this point, this lack of information can be due to incomplete MS/MS databases or to the fact that extracts are actually composed of currently unknown specialized metabolites.
According to t-SNE representation of the MS/MS data allowing to evaluate inter-cluster distance, we demonstrate that OCU Colletotrichum sp. 4 is producing metabolites related to those in Colletotrichum sp.11, especially cytochalasin analogues. Indeed, the chemical study of one strain of OCU Colletotrichum sp.4 (BSNB-0536) has led to the isolation of cytochalasin metabolites.
Finally, OCU Colletotrichum sp.2 (BSNB-0583) was found to be the strain with the highest probability to isolate new and bioactive metabolites. Chemical study of the extract lead to the isolation of a new series of metabolites containing a decadienamide chain. These metabolites are named colletamide A-D and seems to have, has their known analogues, antimicrobial activities. They are the major components of the extract but minor ones also exist in this series, according to the MN.
The methodology presented in this study targets promising bioactive strains: we distinguished different Operational Chemically Unit of Colletotrichum by MALDI-TOF MS fingerprinting of genetically close isolates and observed their chemical diversity by Molecular Networking. Application of this approach for investigating many strains exhibiting various biological activities could accelerate the discovery of novel bioactive metabolites.

Experimental section
General experimental procedures. NMR spectra were recorded in CD 3 OD or (CD 3 ) 2 SO on a Bruker 500 MHz spectrometer or a Bruker 600 MHz spectrometer equipped with a 2 mm invers detection probe. Chemical shifts (δ) are reported in ppm based on TMS signal. Coupling constants (J) are in hertz. High-resolution ESI-TOF-MS measurements were performed using a Waters Acquity UPLC system with column bypass coupled to a Waters Micromass LCT Premier time-of-flight mass spectrometer equipped with an electrospray interface (ESI). Flash chromatography was performed on a Grace Reveleris system equipped with a 120 g C 18 column or a 40 g C 18 column 55 . Flow rate was 80 mL/min or 40 mL/min respectively and detection was performed with dual UV at 210 and 270 nm and ELSD. Analytical and preparative HPLCs were conducted on a Gilson system equipped with a 322 pumping device, a GX-271 fraction collector, a 171 diode array detector and a prepELSII detector electrospray nebulizer. Wavelenghts for UV absorbance were 210, 220, 254, 270 and 320 nm. Columns used for analytical and preparative experiments included a Phenomenex Luna C 18 5 µm 4.6 × 250 mm and a Phenomenex Luna C 18 5 µm 21.2 × 250 mm. Flow rate was 1 mL/min and 21 mL/min respectively for analytical and preparative experiments. All solvents were HPLC grade and HPLC-grade water was obtained with a Milli-Q water purification system (Synergy, Merck). For UHPLC-HRMS analysis, chromatographic separation was performed on an Acquity UHPLC system interfaced to a Q-Exactive Plus mass spectrometer (Thermo Scientific, Bremen, Germany) with a Waters BEH C18 50 × 2.1 mm × 1.7 μm analytical column, using a heated electrospray ionization (HESI-II) source. Detection was made by an Acquity UPLC photodiode array detector from 200 to 500 nm. The optimized HESI-II parameters were as follows: source voltage, 3. www.nature.com/scientificreports/ MALDI-TOF MS analyses were performed using an UltrafleXtreme mass spectrometer (Bruker Daltonics, Bremen). Acquisitions were performed in linear positive ion mode. The laser intensity was set just above the ion generation threshold to obtain peaks with the highest possible signal-to-noise (S/N) ratio without significant peak broadening. The mass spectrometer was externally calibrated using a mixture of proteins (Insulin, Cytochrome C, Myoglobin and Ubiquitin I). All data were processed using the program FlexAnalysis (Bruker Daltonics, Bremen).

Isolation and purification of endophytic strains. Astrocaryum sciophilum palm trees were sampled in
French Guiana at Piste de Saint-Elie, Sinnamary, in July 2014. The general procedures adopted for isolation of the microorganisms followed the methodology described by Casella et al. 56 . After collection, the plant material was washed with sterile water and surface sterilised by sequential immersion in 70% aqueous ethanol (3 min), followed by 5% aqueous sodium hypochlorite (5 min) and finally by 70% aqueous ethanol (1 min). Leaves were cut into small pieces (1-0.5 cm 2 ) which were placed on Potato Dextrose Agar medium (PDA, Fluka Analytical, Germany) in Petri dishes at 28 °C (4-5 parts per Petri dishes). Each individual hyphal tip of emerging fungi was removed and placed on a sterile PDA culture medium in 10 cm Petri dishes. The leaf fragments were cultured for a maximum of one month. All isolated endophytic strains have been deposited at the ICSN/CNRS Strain library France. Strains are maintained in triplicate in 2 ml Eppendorf tubes containing 1 ml of a solution of glycerol and water (1:1) at − 80 °C.
Phylogenetic analyses. Fungal strains were identified using nucleotide sequencing of the rDNA ITS region (ITS1-5,8S-ITS2). ITS sequences were blasted on NCBI GenBank (accessed 2018-03-12) to search for closest strains (See supplementary Table S2). 16 57 . The resulting alignment was saved into Phylip4 format. A phylogenetic tree was constructed by maximum likelihood (ML) inference on the CIP-RES server (https ://www.phylo .org) 58 . The ML analysis was performed using RaxML-HPC2 on XSEDE (8.2.10). Branch support was evaluated by a bootstrapping method with 1000 replicates. The final phylogenetic tree was visualized using FigTree version 1.4.4. The sequence data have been submitted to GenBank with an accession number for each strain (See Supplementary Table S1).

MALDI-TOF analysis.
Protein extraction were performed in triplicate according to the acetonitrile/formic acid extraction procedure of Bruker MALDI Biotyper 20 . After a culture of 72 h at 28 °C on PDA (Potato Dextrose Agar) medium, 10 mg of each isolate was suspended in 300 µL of water and 900 µL of 100% ethanol then centrifuged at 13,000 rpm. The supernatant is discarded and the pellet was resuspended in 40 µL of formic acid and vortexed. An equal volume of 100% acetonitrile was then added and the solution mixed. After centrifugation at 13,000 rpm, the supernatant was stocked at − 20 °C for maximum one week and used for MALDI measurement. For MALDI analysis, 1 µL of protein extract is spotted on the sample target, directly followed by 1 µL of α-cyano-4-hydroxycinnamic acid (CHCA) matrix (50 mg/mL in CH 3 CN/H 2 O (50/50), acidified with 0.1% trifluoroacetic acid).
Hierarchical cluster tree. A hierarchical cluster tree was constructed using the Mass-up software 59 . Each MALDI TOF MS spectra was preprocessed using the following parameters: intensity was square-root transformed, smoothing was done using moving average and baseline correction using TopHat. Peak detection was set to Mass SpecWavelet with a minimum peak intensity chosen in order to detect only the major peak. Peak matching on peak lists was done with a tolerance of 700 ppm for intra-sample matching. Clustering analysis was generated using Jaccard distance function and conversion values of presence.
Cultures and extraction of secondary metabolites. Each strain was cultivated at 28 °C in 10 Petri dishes (10 cm diameter) of PDA (Potato Dextrose Agar) media. Then, culture media was extracted with ethyl acetate (EtOAc) at room temperature during 24 h. The organic phase was removed via filtration, washed three times with H 2 O, dried with anhydrous solid Na 2 SO 4 and evaporated using a rotary evaporator under reduced pressure to yield a crude mixture.

UPLC-MS/MS and data analysis. Each EtOAc extract of Colletotrichum strain was profile on a UPLC-
MS/MS to acquire mass data for the implementation of molecular network 55 . Samples were pre-treated by Solid phase extraction using Discovery® SPE 96 well plate (bed wt 100 mg/well) and plate prep vacuum Manifold. Each extract was then dissolved in 500 µL of 5% milliQ water in HPLC methanol and loaded on the cartridge bed. Elutes were injected at a concentration of 4 mg/mL for Orbitrap analysis. For UHPLC-HRMS analysis, chromatographic separation was performed on an Acquity UHPLC system interfaced to a Q-Exactive Plus mass spectrometer (Thermo Scientific, Bremen, Germany), using a heated electrospray ionization (HESI-II) source. Thermo Scientific Xcalibur 2.1 software was used for instrument control and data analysis. www.nature.com/scientificreports/ One µL aliquot of each sample was injected and eluted at 0.6 mL/min. The linear gradient between CH 3 CN and H 2 O (0.1% formic acid modifier) was made from 5 to 100% of acetonitrile over 7 min following by an isocratic gradient of 100% CH 3 CN for 1 min. In positive ion mode, the di-isooctyl phthalate C 24 H 38 O 4 [M + H] + ion (m/z 391.28429) was used as an internal lock mass. The mass analyzer was calibrated using a mixture of caffeine, methionine−arginine−phenylalanine−alanine−acetate (MRFA), sodium dodecyl sulfate, sodium taurocholate, and Ultramark 1621 in an acetonitrile/methanol/water solution containing 1% formic acid by direct injection.
The optimized HESI-II parameters were as follows: source voltage, 3.5 kV (pos); sheath gas flow rate (N2), 55 units; auxiliary gas flow rate, 15 units; spare gas flow rate, 3.0; capillary temperature, 275.00 °C (pos), S-Lens RF Level, 45. The mass analyzer was calibrated using a mixture of caffeine, methionine−arginine−phenylalanine−alanine−acetate (MRFA), sodium dodecyl sulfate, sodium taurocholate, and Ultramark 1621 in an acetonitrile/ methanol/water solution containing 1% formic acid by direct injection. The data-dependent MS/MS events were performed on the four most intense ions detected in full scan MS (Top3 experiment). The MS/MS isolation window width was 1 Da, and the normalized collision energy (NCE) was set to 35 units. In data-dependent MS/ MS experiments, full scans were acquired at a resolution of 35 000 FWHM (at m/z 200) and MS/MS scans at 17 500 FWHM both with a maximum injection time of 50 ms. After being acquired in a MS/MS scan, parent ions were placed in a dynamic exclusion list for 2.0 s. MZmine 2 data-preprocessing parameters. The complete procedure has been Raw files were converted into mzXML files using MSConvert software. Then mzXML files were processed using MZmine 2.37 60,61 . Mass detection was realized with centroid mass detector with the noise level set to 1.0E5 for MS level set to all. The ADAP chromatogram builder 62 was achieved using a minimum group size of scans of 5, minimum group intensity threshold of 1.0E5, minimum highest intensity of 1.0E5 and m/z tolerance of 0.002 or 5 ppm. Wavelets (ADAP) algorithm was used for chromatogram deconvolution with the following settings: S/N threshold of 10, intensity window SN, minimum feature height of 1000, coefficient area threshold of 100, peak duration range between 0.01 and 0.5 and the RT wavelet range between 0.001 and 0.05. The m/z and RT range for MS/MS scan pairing was set to 0.001 Da and 0.05 min respectively. Chromatograms were deisotoped using the isotopic peaks grouper algorithm with a m/z tolerance of 0.003 (5 ppm), a RT tolerance of 0.1 (absolute), a maximum charge of 2 and the representative isotope used was the most intense. Peak alignment was performed using the join aligner method: m/z tolerance of 0.001 or 5.0 ppm, weight for m/z of 0.001, RT tolerance of 0.3 min, weight for RT of 0.1. Adduct search (Na + , K + , NH 4 + , CH 4 CN + ) was conducted on the peak list with a RT tolerance set to 1.0 min and the maximum relative peak height at 50%. Adducts found were then removed from the peak list. Peak list was gap-filled with the peak finder module: intensity tolerance of 90%, m/z tolerance of 0.001 or 5.0 ppm and RT tolerance of 0.1 min. Peak list was exported for GNPS to create a mgf file (https ://doi.org/10.5281/zenod o.38625 73). Row ID, row m/z, row retention time and peak area was exported in an associated CSV file. After a first analysis of the generated MN, this peak list was reduced to the ions with m/z between 200 and 900 to diminish the data size.

Molecular network analysis.
After the preprocessing of the LC-MS/MS data with MZmine 2.37, the output mgf file was then processed with MetGem software 28 to give a network containing 15,261 nodes. Molecular network was generated using the following parameters: m/z tolerance set to 0.02, Minimum Matched Peaks set to 6, topK set to 10, Minimal Cosine score Value of 0.7 and Max. Connected Component Size of 100. Associated CSV file was then loaded. For the mapping process, relative quantification of each ion was represented as pie chart-diagrams, whose proportions were based on respective areas of the corresponding extracted ion chromatograph area (EIC). The spectra in the network were then searched for analogues against the spectral libraries available. The library spectra were filtered in the same manner as the input data. All matches kept between network spectra and library spectra were required to have a score above 0.7 and at least 6 matched peaks. m/z Tolerance for the search of analogues was set to 100. For t-SNE output, cosine score threshold (0.7) was similar to molecular network view. The number of iterations, perplexity, learning-rate and early exaggeration parameters were set to 1000, 6, 200 and 12, respectively. These values were previously optimized on another complex set of data 28 .

Large-scale cultivation and isolation.
To isolate the metabolite that give the ion of m/z 346.275, a second large-scale cultivation of BSNB-0583 was performed on 169 Petri dishes. 1.9 g of crude extract was obtained and a liquid-liquid partition (MeOH/ Hexane) on 0.8 g was executed. 0.5 g recovered from the MeOH phase was futher fractionated by reverse flash chromatography on C 18 column (12 g, flow rate 30 mL/min). For the elution, a linear gradient was performed between water and acetonitrile modified with 031% formic acid: 25/75 for 3 min, 25/75 to 0/100 in 10 min and 0/100 for 6 min. Thanks to the ELSD detection, nine zones were collected and analysed with mass spectrometry. The fraction 6 (1.    X-ray analysis. X-ray diffraction data for cyclo-(Phe-Leu-Leu-Leu-Ile) was collected by using a VENTURE PHOTON100 CMOS Bruker diffractometer with Micro-focus IuS source Cu Kα radiation. Crystal was mounted on a CryoLoop (Hampton Research) with Paratone-N (Hampton Research) as cryoprotectant and then flashfrozen in a nitrogen-gas stream at 100 K 64 . The temperature of the crystal was maintained at the selected value by means of an N-Helix cooling device to within an accuracy of ± 1 K. The data were corrected for Lorentz polarization, and absorption effects. The structures were solved by direct methods using SHELXS-97 65 and refined against F 2 by full-matrix least-squares techniques using SHELXL-2018 66 with anisotropic displacement parameters for all non-hydrogen atoms. Hydrogen atoms were introduced into the calculations as a riding model with isotropic thermal parameters. All calculations were performed by using the Crystal Structure crystallographic software package WINGX 67 . The crystal data collection and refinement parameters are given in Table S7. CCDC 1,944,171 contains the supplementary crystallographic data for this paper. These data can be obtained free of charge from the Cambridge Crystallographic Data Centre via https ://www.ccdc.cam.ac.uk/Commu nity/ Reque stast ructu re.
Computational details. All calculations have been performed using Gaussian 16W. Prior to DFT calculations, a conformational analysis has been performed using the GMMX plugin. Each conformer has been optimized using DFT at the B3LYP/6-31g(d) level. Frequency calculation have been performed at the same level of theory. Rotational strengths have been calculated using the B3LYP/6-311+g(d,p) for 20 excited states. ECD spectra were plotted using the Gaussview 6. GIAO NMR properties were predicted using the MPW1PW91/6-311+g(2d,p). Theoretical and experimental NMR chemical shift were compared using common metrics such as linear correlation (R 2 ), mean average error (MAE) and DP4 probability [ \* MERGEFORMAT 33].