Application of metabolomics and molecular networking in investigating the chemical profile and antitrypanosomal activity of British bluebells (Hyacinthoides non-scripta)

Bulb, leaf, scape and flower samples of British bluebells (Hyacinthoides non-scripta) were collected regularly for one growth period. Methanolic extracts of freeze-dried and ground samples showed antitrypanosomal activity, giving more than 50% inhibition, for 20 out of 41 samples. High-resolution mass spectrometry was used in the dereplication of the methanolic extracts of the different plant parts. The results revealed differences in the chemical profile with bulb samples being distinctly different from all aerial parts. High molecular weight metabolites were more abundant in the flowers, shoots and leaves compared to smaller molecular weight ones in the bulbs. The anti-trypanosomal activity of the extracts was linked to the accumulation of high molecular weight compounds, which were matched with saponin glycosides, while triterpenoids and steroids occurred in the inactive extracts. Dereplication studies were employed to identify the significant metabolites via chemotaxonomic filtration and considering their previously reported bioactivities. Molecular networking was implemented to look for similarities in fragmentation patterns between the isolated saponin glycoside at m/z 1445.64 [M + formic-H]− equivalent to C64H104O33 and the putatively found active metabolite at m/z 1283.58 [M + formic-H]− corresponding to scillanoside L-1. A combination of metabolomics and bioactivity-guided approaches resulted in the isolation of a norlanostane-type saponin glycoside with antitrypanosomal activity of 98.9% inhibition at 20 µM.


Materials and Methods
samples. Bluebell plants used in this study were collected under licence (Grid Reference SH 257 359). Soil and above ground plant material were taken using a stainless steel square hollow section (dimensions 20 cm by 20 cm and 30 cm depth). Bluebell plants were removed from this soil sample and separated into different tissue types (bulbs, leaves, scapes, shoots and flowers). The tissue types were combined, freeze-dried (CHRIST Alpha 1-2 LD plus) and ground using pestle and mortar. Each sample contained approximately 20 individual plants. The first sample was taken in March 2014 of plants with an emerged shoot. The end of the plant's life occurred in July followed by the subterranean phase (during the end of July-October). On three occasions, samples from the time of shoot emergence and re-growth in the following growth period were also taken to account for one full growth period. Sampled plant parts are given in Table 1.
Freeze-dried and powdered plant tissues (100 mg) were extracted with MeOH on a 10 mg dry weight to 1 mL of solvent ratio. The samples were shaken at 250 rpm for 30 min. After this period, the samples were centrifuged for 30 min at 5000 rpm, the clear supernatant was separated and the solvent was evaporated. Aliquots of 1 mg of the dried extracts were transferred into LC vials for the subsequent metabolomics profiling analysis.

Metabolomics Analysis and Profiling. Liquid Chromatography -High Resolution Fourier Transform Mass
Spectrometry (LC-HRFTMS) Analysis. HPLC analysis was carried out using DionexUltiMate 3000-Thermo Scientific. Dried plant extracts (1 mg) were dissolved in 1 mL HPLC-grade MeOH. The samples were eluted through a C-18 column (ACE, 75 mm, id 3.0 mm, particle size 5 μm). The mobile phase used 0.1% formic acid in HPLC-grade water (solvent A) and acetonitrile (solvent B). The flow rate was 300 µL/min. Gradient elution was employed, commencing at 10% B, held for 5 minutes, increased to 100% B over 30 minutes, held for another 5 minutes before decreasing back to 10% B, held for 4 minutes. Solvent blank refers to sample containing solvent only which was run for the purpose of subtracting background spectra.
Mass spectrometry was carried out using an Exactive mass spectrometer with an electrospray ionization source over a mass range of 100-2000 m/z in positive and negative ionization modes with spray voltage of 4.5 kV and capillary temperature at 270 °C. LC-MS data was acquired using Xcalibur version 2.2.
Data dependent MS2 and MS3 experiments were carried out using a Finnigan LTQ Orbitrap coupled to a Surveyor Plus HPLC pump and autosampler (Thermo Fisher, Bremen, Germany) in positive and negative ionization modes using a mass range of 100-2000 m/z and 30,000 resolution. The capillary temperature was 270 °C, the ion spray voltage was 4.5 kV, the capillary voltage 35 V, the tube lens voltage 110 V and the sheath and auxiliary gas flow rates were 50 and 15, respectively (units not specified by manufacturer).

Mass Spectral Data Processing and Metabolomic
Profiling. Mass spectral data was processed using the MZmine 2.20 freeware (http://mzmine.sourceforge.net/) 25 . Throughout the data processing, the m/z tolerance used was 0.001, peaks were detected above 5.00E4 and the minimum time span and t R tolerance was 0.2 minutes. Mass detection was performed using centroid mass detector with a noise level set at 1.00E4. The peaks were filtered using FTMS shoulder peaks filter with a mass resolution of 50,000 and the Gaussian peak model.
Chromatograms for each sample and solvent blank were generated to identify the individual peaks and the solvent blank peaks were substracted from the samples. In addition to [M + H] + , Na + , K + and NH 4 + adducts were searched for after positive mode ESI. In negative mode ESI HCO 2 − adducts were identified, while for both modes CH 3 CN adducts were considered. Molecular formula prediction was performed for unknown compounds with the number of hits limited by assigning the maximum number of atoms expected of each element (C, H, O, N) along with the mass range of the putative secondary metabolites in the plant extracts. Formula prediction was accomplished using the element count heuristic rules 32 and double bond equivalence restrictions (RDBE) with the enabled isotope pattern filter. The processed positive and negative raw data were then converted to CSV files and imported into an in-house macro written on Excel 25 . Peaks with intensities 20 times greater in the samples than in the blank solvent were retained. Through EXCEL, macros were also run to match the m/z peaks with the DNP database version 2015 33 to enable dereplication of the ion peaks at a m/z threshold of ±3 ppm, which provided a list of known and unknown metabolites according to their peak intensity. This data was exported to SIMCA V 15.0 (Umetrics, Umeå, Sweden) and further analysed using both PCA and OPLS-DA with Pareto scaling. Molecular networking. LC-MS/MS data were subjected to a molecular networking analysis at Global Natural Products Social (GNPS) Molecular Networking website. Metabolite's networks were then visualized using Cytoscape 2.8 X according to the steps mentioned on their website https://bix-lab.ucsd.edu/display/Public/ Molecular+Networking+Documentation.

Isolation of saponins from bluebell flowers.
Preparation of the bluebell flower extract. Dried bluebell flowers (124 g) were extracted with MeOH (2.5 L) by soaking overnight and occasional stirring. The extraction was repeated two more times. The extracts were combined and dried down using a rotary evaporator at temperatures below 40 °C. The purple coloured residue was dissolved in water and adsorbed onto 150 mL of HP20 gel previously activated by stirring with MeOH followed by water each for 1 hr. The HP20 gel was separated from the unbound part and washed with deionised water. Then methanol was used to elute the flower extract, which was collected and dried down giving a dark purple-coloured powder (11.13 g).
Fractionation of the flower extract. A total of (9 g) of the MeOH extract was fractionated. Automated flash chromatography was carried out using Reveleris Automated Flash System where the chromatographic run was monitored with both evaporative light scattering detector (ELSD) and a UV detector. 18 portions, 500 mg each, of the pigment were individually adsorbed on (2 mL) of silica and dry loaded onto a 12 g Reveleris C18 column. Flash chromatography was carried out applying a flow rate of 18 mL/min and a default fraction volume of 25 mL for all separations. The chromatographic separation was performed using a solvent system of A: MeCN and B: H 2 O both acidified with 0.1% formic acid (FA). The method started with initial 2 min hold at 100% B, then 100-86% B (2 min), 86-80% B (60 min), 80-62% B (11 min) and held at 62% B for 5 min. Combined fractions of 4500 mg were collected at t R 75 min and used for further purification. From this fraction, 300 mg was taken and dissolved in 6 mL de-ionised water and 1 mL portions were injected onto a 12 g Reveleris C18 column. Gradient www.nature.com/scientificreports www.nature.com/scientificreports/ elution was applied using a solvent system of A: MeCN and B: H 2 O both acidified with 0.1% FA starting from 100% B for 2 min then 100-70% B in 1.5 then 70-62% in 30 min. The purified saponin (20.3 mg) was collected at t R = 11.0 min as a white powder upon solvent evaporation.
Spectrometric analysis (MS and NMR) of the isolated saponin. NMR spectra were recorded on a Bruker Avance (400 and 500 MHz) spectrometer. A set of experiments was run including 1 H, 13 C, COSY, HSQC, HMBC, HSQC-TOCSY and NOESY. Pyridine-D 5 (Cambridge Isotope Laboratories, Inc) was used as the NMR solvent and the obtained spectra were referenced using the internal signal from the solvent.
High resolution ESI-MS mass spectra for the isolated compound were obtained using Thermo-Finnigan LTQ Orbitrap (Thermo Scientific, Germany).
Biological activity. Antitrypanosomal Activity. Antitrypanosomal activity of the plant extracts was assessed using 10 4 trypanosomes per mL of Trypanosoma brucei brucei strain TC 221 cultivated in Complete Baltz Medium. Trypanosomes were tested in 96-well plate chambers against different concentrations of test substance in DMSO (0.25-40 µM). The plates were incubated at 37 °C in an atmosphere of 5% CO 2 for 24 h. After addition of 20 µL of Alamar Blue, viable cells were assessed measuring absorbance at 550 nm using an MR 700 Microplate Reader after 48 and 72 h. The effect of the test compound was quantified in IC 50 values by linear interpolation of three independent measurements 34 .
The isolated saponin was tested against the bloodstream form of Trypanosoma brucei brucei S427. Samples were initially screened at a single concentration of 20 µM (n = 3). The compound was prepared as 10 mM stock solution in 100% DMSO and diluted with HMI-9 (10% FBS) medium. Controls included a sterility control, 0.2% DMSO and a concentration range of suramin as a positive control. Suramin gave an antitrypanosomal activity against T. b. brucei at an MIC of 0.11 μM. Trypanosomes were counted using a haemocytometer and diluted to a concentration of 3 × 10 4 trypanosomes/mL (HMI-9 (10% FBS)), 100 µL of this suspension was added to each well. The assay plate was incubated at 37 °C with a humidified atmosphere containing 5% CO 2 for 48 hours. Then 20 µL of Alamar blue was added and the incubation continued for a further 24 hours. Fluorescence was then determined using a Wallac Victor 2 microplate reader (Excitation 530 nm Emission 590 nm). The results were calculated as % of the DMSO control values.

Results and Discussion
Metabolomics profiling of British bluebell extracts. To investigate a whole plant bluebell metabolome of one population, plant extracts were prepared from various plant parts collected throughout the growth period and were analysed in both positive and negative ion mode by ESI-HRMS. The PCA scores scatter plot of ESI-HRMS data (Fig. 1A) revealed the discrimination of bulb extracts from the other above ground plant extracts. Although the R 2 and Q 2 values for the PCA model were quite low (0.448 and 0.149, respectively), the scores scatter plot indicated a significant variation in the chemical profile of the bulb extracts while clearly separating the underground from the aerial parts. The PCA loadings scatter plot (Fig. 1B) showed the distribution of the metabolites (m/z) for the respective samples positioned at the same quadrant in the scores plot (Fig. 1A). The metabolites were colour-coded according to their m/z ranges on the PCA loadings scatter plot (Fig. 1B). Metabolites with relatively lower molecular weights were more abundant in the upper left quadrant representing the bulbs collected between March and May or during springtime. On the other hand, distribution of higher molecular weight metabolites were of higher density on the right quadrants of the PCA loadings plot representing the aerial plant parts that included the leaf, shoot, scape, and flower as shown. The first two principle components of the PCA accounted for a variance of 16% (R 2 X[1] = 0.162) and 12% (R 2 X[2] = 0.123), respectively. The higher variance for PC1 (R 2 X[1] = 0.162) accounted for the variation in chemical profile between the underground and aerial parts (Fig. 1A). However, the PCA plot also exhibited a significant dispersal of bulb extracts particularly between two groups involving the collections from late March to early May and those from June to early July. Their dispersal from the other bulb samples reflected the unique chemical fingerprints of the bulbs during mid-spring and the early summer months, respectively. The significance of the outlying samples was further investigated by generating a PCA-class model for the bulbs. A PCA-class score plot shown in Fig. 1C illustrated that none of the samples were considered strong outliers. A larger sized symbol was used for observation points with DModX values twice as large as the DCrit values, as demonstrated by the samples collected on the 24th of March, 16 th of April and 1 st of May. These mid-spring extracts were then interpreted as moderate outliers 35 . The DModX is the distance of the respective samples to the X plane of the model while DCrit is the critical distance to the model computed from the F-distribution to attain a 95% confidence interval limit. In simpler terms, at a significant level of 0.95, 95% of the observations should have DModX values less than the DCrit value. Dcrit limit was set at 0.05. In this study, bulb samples collected in mid-spring indicated, "these observations were different from the normal observations with respect to the correlation structure of the variables" 35 . This reflected the distinct chemical profile for those extracts, which coincided with the most active growth phase above ground while the bulb diminished prior to the formation of a new bulb 36 .
As shown in the PCA loadings plot (Fig. 1B) Table 2). These metabolites were either triterpenoids or steroids in structures. Furthermore, a molecular correlation network generated from the MS/MS data of the plant extracts (Fig. 1D) (Fig. 2) suggested the presence of strongly linked metabolites, the structures of which were acetylated congeners of the aglycone of lucilianoside D (Fig. 2) isolated from the fresh bulbs of the Japanese Muscari paradoxum 37 . Mass spectral data is presented in the Supplementary Information (Section 1 SI). The fragmentation pattern for both ion peaks involved the subsequent loss of a keto group and CO 2 that is typical for a five-membered lactone ring system 38 Fig. 1D, which could be explained by the typical steroid fragmentation pattern 39 between rings A and B indicating that the 3 oxygen atoms were on ring C or D (Fig. 2).
For a better observation of the chemical variation between the plant parts especially between the various aerial parts, a hierarchical cluster analysis (HCA) (Fig. 3A) was generated from the OPLS-DA scores plot model (R 2 = 0.948, Q 2 = 0.748; Fig. 3B). The model was validated by a permutation test (Fig. 3C). The Y intercept (Q 2 Y) on the permutation graph is a measure to check against overfitting. A clear indication that the model is valid and does not happen by coincidence is when the Q2 values of the permuted Y models are less than zero on the permutation plot test 35 . For this study, the model attained a Q 2 Y value of −0.55. Moreover, the difference between Q 2 and R 2 Y was 0.2, which is less than 0.3, indicating the absence of overfitting. For the HCA dendrogram the clusters were sorted from right to left on the horizontal axis according to increasing observation indices. While the vertical axis indicates the cluster similarity, the variance increases when clusters are merged. As shown by the HCA plot (Fig. 3A), samples were grouped into three main clusters divided into two levels. The first level of the dendrogram established the separation of the underground parts from the aerial parts. The second clustering represented by the aerial parts instigated the second level yielding two groups. The shoot and leaf parts were grouped together while the scape and the flower parts came under one group demonstrating similarities in chemical profiles of these plant parts in each respective group. The scape being a form of a long leafless flowering stem explicated its clustering with the flower parts while on the other hand a shoot system would botanically consist of the stems and leaves. The unique "biomarker" metabolites for the three main clusters have been highlighted on the OPLS-DA loadings scatter plot (Fig. 3D) and listed on Table 2 as also indicated on the chromatograms of the bioactive extracts from their respective plant parts (Fig. 4)    www.nature.com/scientificreports www.nature.com/scientificreports/ Biological investigation of plant extracts for their anti-trypanosomal activity against the blood stream form of T. brucei brucei TC221 revealed active and inactive samples. Table 3 presented the IC 50 values in μg/mL for the tested plant extracts that showed significant inhibition. Biologically active extracts were obtained from the leaves, shoots (except those from the February collection, which was not shown in Table 3), and most of the flowers (excluding the collection from the 15 th of May), two bulb samples collected in mid-April and late October and one scape extract from the mid-April collection. Flower samples demonstrated the highest percentage of growth inhibition i.e. flower samples collected at 29 th May exhibited 99.1% of growth inhibition with an IC 50 concentration of 11.08 μg/mL.
In order to pinpoint the structures that were likely to mediate the bioactivity, OPLS-DA was performed on active versus inactive extracts. The model gave good fitness with a R 2 of 0.891 but gave a low Q 2 predictability of 0.463. The initial OPLS-DA model (Fig. 4A)    www.nature.com/scientificreports www.nature.com/scientificreports/ which was found on the active left quadrant of the scores scatter plot aligning with the bulb samples collected on 29 th of October (Fig. 5A). The bulb extracts obtained from the collection at the end of October gave a weak bioactivity with an IC 50 value of 37.3 µg/mL, while it inhibited cell growth at only 62.0% at a concentration of 10 µg/mL. This weak activity of the extract was reflected by its position on the OPLS-DA loadings plot. Furthermore, the presence of a strong outlier represented by a bulb sample collected in mid-April could have also affected the Q 2 score. The within class or common internode variation R 2 X[o2] was at 0.124, which is greater than R 2 X[1] could be due to the seasonal variation resulting to differences in chemical profile of the various plant parts. The model was also validated by a permutation test (Fig. 5B). For this study, the model attained a Q 2 Y value of −0.424, which  Table 2. The other labelled peaks were those found to be the common metabolites in the bioactive extracts. www.nature.com/scientificreports www.nature.com/scientificreports/ is an indication that the model is valid. However, the difference between Q2 and R2Y was 0.428, which is greater than 0.3, which may indicate a slightly overfitted model or could have been due to the large variation within the respective groups.
In order to improve the model, the outlier (16 Apr-bulb) and between quadrants samples (9 Feb-shoot and 29 Oct-bulb) were removed, which significantly improved the Q 2 to 0.699 as shown in Fig. 5C. However, the goodness of fit (R 2 ) did not significantly changed at 0.879. This enhanced the separation between the active and inactive samples increasing the R 2 X[1] variation to 0.111 while the common internode variation R 2 X[o2] was also lesser at 0.104. For the permutation test (Fig. 5D), the model attained a Q 2 Y value of −0.358, while the difference between Q 2 and R 2 Y was decreased to 0.18 which is now less than 0.3 indicating that the model was not overfitted.
The S-loadings plot (Fig. 6) highlighted the metabolites that highly correlated to the anti-trypanosomal activity of the active extracts. A list of the putatively identified active metabolites is shown in Table 4. Hits were reduced by applying a taxonomic filter and by noting the previously reported biological activities. Metabolites in Table 4 were dereplicated against DNP database and ranked according to their p-value (probability of being responsible for the activity). On top of the list is the ion peak at m/z 1225.58619 [M + H] + eluting at 15.63 min with a predicted molecular formula of C 57 H 92 O 28 and was matched to scillascilloside E2 40 . In addition, the compound that possessed the highest covariance is the ion peak at m/z 1283.5914 (15.73 28 , and matched to scillanoside L-1 40 . Scillanoside L-1 was isolated together with scillascilloside E2 from fresh bulbs of Scilla scilloides and both were reported to exhibit variable cytotoxicity against eight cancer cell lines i.e. HT1080 (fibrosarcoma), B16 (F-10) (melanoma), 3LL (lung carcinoma), MCF7 (breast cancer), PC-3 (prostate cancer), HT29 (colon adenocarcinoma), LOX-IMVI (melanoma) and A549 (lung carcinoma) in comparison with adriamycin as a positive control 41 . The rest of the putatively active metabolites in the dereplication table corresponded to previously reported saponin glycosides with various biological activities.
The dereplication results encouraged isolation work of the metabolites from the bioactive extracts to explore the putatively active unidentified structures, which were found to be abundant in the flower and leaf extracts, and test purified compounds against T. brucei brucei. Due to the relatively higher percentage of flower samples found active against T. brucei brucei, a flower extract was chosen for the further isolation work. Bioactivity-guided isolation afforded the potent anti-trypanosomal active saponin glycoside at m/z 1445.645 [M+formate-H] − which eluted at 15.25 minutes. The compound was designated as 'unreported' in the dereplication step as no match in either DNP or the literature was found. As part of the structure dereplication process, a molecular interaction network (MN) of MS/MS data was established to find the nearest correlated structure comparable to this unknown compound. Figure 6  This saponin exhibited bioactivity when tested against blood stream Trypanosoma brucei brucei at a concentration of 20 μM corresponding to 28 μg/mL resulting in 98.9% inhibition. For the extracts showing similar inhibition between 98.2 to 99.9% and having IC 50 values between 11.1 to 23.5 μg/mL, the isolated saponin was slightly less active, suggesting synergistic effects. Iminosugars could have contributed to these as previously described for the maintenance of sapogenins in the rumen via glycosidase inhibition. Similarly that study found glucosylated saponins more active compared to the aglycones 42 .   www.nature.com/scientificreports www.nature.com/scientificreports/ A network was also found interconnecting the aglycones and showing a very strong relationship (cosine >0.9), which confirmed a similar fragmentation pattern as well as core basic structure between the respective molecules 30 . Although none of the aglycones were isolated, it was possible to predict their structure using molecular formula prediction tools from Mzmine, which employs the seven golden rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry 32,43 supported by their MS/MS fragmentation data and by comparison with possible hits from the DNP 25 . Figure S2 shows the predicted molecular structures for one of these aglycones. More detail is given in section 1 of the SI. The proposed structures were backed by the presence of close structurally related compounds in previous reports from the Hyacinthaceae 9,37,44-46 .
This study highlighted the dereplication of saponins using high resolution mass spectral data supported by their fragmentation pattern. The putatively identified congeners were statistically predicted for their bioactivity according to their presence in the active fractions and absence in the inactive fractions. This procedure is not dependent on the concentration of the individual metabolites in the respective fractions, which means a very potent metabolite at very low concentration can be designated as a bioactive metabolite. Some of these putatively identified bioactive metabolites can be present at ug or ng concentrations that are not possible to isolate them with low amount of starting material available.
Only one compound was targeted to be isolated from this work on the basis of its bioactivity and yield or its feasibility to be isolated from the available crude extract. As already mentioned above, the targeted compound did not give any "known hit" from the DNP database and was considered to be a new congener. The elucidation work is presented in the Supplementary Information. Statistically at P < 0.05 (as shown on Table 4), metabolites were predicted to be more potent than the isolated compound (P value at 0.25), one of these compounds is structurally related to scillanoside L-1 with one sugar unit less than the isolated compound. However, as again mentioned above, some of these putatively identified bioactive metabolites can be present at ug or ng concentrations that are not possible to isolate them with very low amount of starting material available. From the S-plot, it was indicated that the aglycones on their own were found to be inactive while their glycosidic counterpart were found more in the active end of the S-plot.
The other compounds described in this study were putatively identified but confirmed from their high resolution mass spectral and fragmentation data while we present in this work the hypothetical fragmentation path  Fig. S3. The high resolution data presented in this study for the dereplicated molecular formula had an average difference of 3 ppm between calculated and found mass.
In conclusion, a combination of metabolomics and bioactivity guided isolation approaches gave access to a shorter route to explore a bioactive saponin glycoside from the British Bluebell, which was simplified due to the implementation of MN for MS/MS data.