Metabolic profiling of cytotoxic metabolites from five Tabebuia species supported by molecular correlation analysis

Tabebuia is the largest genus among the family Bignoniaceae. Tabebuia species are known for their high ornamental and curative value. Here, the cytotoxic potential of extracts from the leaves and stems of five Tabebuia species was analyzed. The highest activity was observed for T. rosea (Bertol.) DC. stem extract against HepG2 cell line (IC50 4.7 µg/mL), T. pallida L. stem extract against MCF-7 cell line (IC50 6.3 µg/mL), and T. pulcherrima stem extract against CACO2 cell line (IC50 2.6 µg/mL). Metabolic profiling of the ten extracts using liquid chromatography–high-resolution mass spectrometry for dereplication purposes led to annotation of forty compounds belonging to different chemical classes. Among the annotated compounds, irridoids represent the major class. Principle component analysis (PCA) was applied to test the similarity and variability among the tested species and the score plot showed similar chemical profiling between the leaves and stems of both T. pulcherrima and T. pallida L. and unique chemical profiling among T. rosea (Bertol.) DC., T. argentea Britton, and T. guayacan (Seem.) Hemsl. leaf extracts and the stem extract of T. rosea (Bertol.) DC. Additionally, a molecular correlation analysis was used to annotate the bioactive cytotoxic metabolites in the extracts and correlate between their chemical and biological profiles.


Scientific Reports
| (2021) 11:8405 | https://doi.org/10.1038/s41598-021-87695-w www.nature.com/scientificreports/ for discovering anticancer candidates. About 60% of drugs used nowadays to treat cancer were isolated from natural products, and more than 3000 plants have been reported to have anticancer activity 10 . However, natural extracts require complicated process to obtain pure identified natural product and liquid chromatography-mass spectrometry is powerful analytical tool for assist the study of plant extracts; it detects a wide range of chemical compounds simultaneously without the need for a tedious isolation procedure 11 . In addition, dereplication process, perform a rapid annotation of known secondary metabolites and their quantification in crude extracts using database searching, to screen samples, which saves time and reduces redundancy during natural product discovery programs 12 . Metabolic profiling is applied to annotate and biotechnologically optimize the production of pharmacologically active secondary metabolites 13 , using multivariate data analysis like Principle component analysis (PCA) to reduce dimensionality of the data and highlights trends, groups or clusters 14 along with molecular correlation analysis to pinpoint the putatively active components.
Consequently, the present work, illustrate the cytotoxic potential of the leaves and stems of five Tabebuia species against three different cell lines [HepG2 (human hepatoma), MCF-7 (human breast adenocarcinoma), and CACO2 (human colon adenocarcinoma) cells]. Metabolic profiling and dereplication approaches were used in order to explore the differences in secondary metabolite patterns between the ten tested samples of Tabebuia species and to develop a database of annotated compounds with a chemical survey of compounds responsible for cytotoxic activity.

Results and discussion
Cytotoxic activity of plant extracts. The results of the cytotoxic activity of the ten plant samples against the three cancer cell lines, HepG2, MCF-7, and CACO2, are represented in (Supplementary Table S1). All extracts except T. guayacan (Seem.) Hemsl. stem extract exhibited significant cytotoxic activity against the HepG2 cell line at the examined concentrations, with T. rosea (Bertol.) DC. stem extract being the most potent (IC 50 4.7 µg/mL). In the MCF-7 cell line, all tested plant samples displayed significant cytotoxic activity except T. guayacan (Seem.) Hemsl. leaf extract and T. rosea (Bertol.) DC. leaf and stem extracts, with T. pallida L. stem extract being the most active (IC 50 6.3 µg/mL). All tested extracts displayed potent activity against the CACO2 cell line, but T. pulcherrima stem extract was the most potent (IC 50 2.6 µg/mL).
Surveying the relevant literature, it was found that the traditional use of Tabebuia species for treating cancer began in Brazil in the 1960s and led to increased sales of the bark and wood of the trunk of T. impetiginosa (Mart. Ex DC.) Standl., T. rosea (Bertol.) DC., and T. serratifolia (Vahl) 15 . The in vitro antitumor potential of the total alkaloid extract of T. rosea (Bertol.) DC. leaves was evaluated in human leukemic cells (MOLT-4) 16 and the anticancer properties of T. pallida L. leaves were evaluated in Ehrlich ascites carcinoma 17 . In terms of bioactive compounds, naphthoquinones, particularly β-lapachone, have been reported to selectively induce apoptotic cell death in various cancers, including breast cancer, prostate cancer, and leukemia 18 . According to the United States National Cancer Institute plant screening program, crude extract is reported to have in vitro cytotoxic activity if the IC 50 is < 30-40 μg/mL 19 . All of the ten extracts tested in the present study showed potent in vitro cytotoxic activity, except T. guayacan (Seem.) Hemsl. leaf extract and T. rosea (Bertol.) DC. leaf and stem extracts, which showed no cytotoxic activity against MCF-7 cells, and T. guayacan (Seem.) Hemsl. stem extract, which showed no cytotoxicity against HepG2 cells.

LC-HRESIMS.
The metabolic profiling of the five Tabebuia species against the Dictionary of Natural Products (DNP) and METLIN databases resulted in annotation of forty compounds belonging to different chemical classes (Fig. 1). Iridoids and phenylethanoids were the major detected chemical classes. "The entire list of 40 compounds can be found as Supplementary  10.65), which is an universal endogenous plant metabolite, and which belongs to more than 30 plant families. Moreover, we have reported the presence of rutin from Tabebuia aurea in our previous work 20 . The figure (S1) shows the retention time alignment of rutin's peak in all samples, which was within the acceptable RT tolerance threshold according to our previous work 21 . It is of note, however, that while the use of rutin allowed us to monitor for RT variability, it did not allow for the correction of potential analytical errors of injection and sample preparation".
Metabolic profiling and molecular correlation analysis. To achieve the best coverage of the Tabebuia metabolome, the positive and negative mass spectral data were merged into one data matrix. The data were processed using MZmine2 according to a method developed previously in our lab 22 and transferred to an in-house database Excel file with a built-in DNP database for dereplication purposes. PCA was applied to the data to test the similarity and/or variation of the chemical profiles of the tested species. PCA is an unsupervised multivariate data analysis that aims to reduce the dimensionality of data to reveal clusters, groups, and/or outliers among observations 23 . The PCA score plot ( Fig. 2A) showed a respective total variance of 42% and 23% for PC1 and PC2 and demonstrated clustering of T. pulcherrima and T. pallida L. leaf and stem extracts (Ta-3, Ta-4, Ta-7, and Ta-8, respectively). This indicates that these extracts have similar chemical profiles. Further, PCA showed a dispersal of T. rosea (Bertol.) DC. (Ta-5), T. argentea Britton (Ta-9), and T. guayacan (Seem.) Hemsl. (Ta-1) leaf extracts as well as the stem extract of T. rosea (Bertol.) DC. (Ta-6), which indicates that these extracts differ in terms of their chemical profiles. The PCA loading plot (Fig. 2B) highlights the fact that metabolites contribute to this variation, as the T. rosea (Bertol.) DC. (Ta-5) and T. argentea Britton (Ta-9) leaf extracts were characterized by xanthonelike molecules corresponding to C 34 H 40 O 9 , which is equivalent to, e.g., moreollic acid or scortechinone A at m/z (retention time in minutes) 592. 268 24 . Scortechinone A and M were also identified previously from G. scortechinii 15  The only hit for this molecule in the DNP database was the fungal metabolite thermolide D, which was filtered out due to taxonomic irrelevance. The dereplication study of the discriminatory molecules was found to be matching with the biological investigation results, as T. rosea (Bertol.) DC. (Ta-5), T. argentea Britton (Ta-9) leaf extracts as well as the stem extract of T. rosea (Bertol.) DC. (Ta-6) were found cytotoxic to one or more of the tested cancer cell lines i.e. HepG2, MCF-7 and CACO2.The biological investigation of Tabebuia species showed that T. rosea (Bertol.) DC. stem  The biological data refer that T. pallida L. leaf (Ta-7) might possess the most cytotoxic agents followed by its stem extract (Ta-8), T. rosea (Bertol.) DC. stem (Ta-6) and T. pulcherrima leaf and stem (Ta-3 and Ta-4). To annotate the agents responsible for this cytotoxicity, the spectral data, integrated with the metadata (biological data), were analyzed via a similarity correlation analysis 26 . The molecular correlation analysis used a Pearson correlation coefficient to detect molecules linked with observed cytotoxicity. In the analysis in Fig. 3A, metabolites that correlate with each other and with the biological data are labeled with their molecular weights and colored in the pie chart according to the concentration of the metabolite in each of the tested species. The Pearson correlation coefficient was set to 0.8. An extracted analysis (Fig. 3B) 31 . Finally, the compounds at m/z 573.278 and at m/z 467.236 were not annotated previously. Unfortunately, relationships between the spectral data and CACO2 metadata were not detected using the similarity correlation analysis. This might indicate that there is no one specific molecule or class of molecules that mediate the biological activity in this case, but a synergism of more than one molecule and/or class of compounds. However, in order to check which metabolites may contribute to cytotoxicity against CACO2 cells, an orthogonal partial least square discrimination analysis module was created. The model has a strong goodness of  (Fig. 4) is a very useful tool for comparing the magnitude of a variable to its reliability, where regression coefficients related to scaled and centered X-variables are displayed. This scaling of the data makes the coefficients comparable. Thus, these coefficients express how strongly Y is correlated to the systematic part of each of the X-variables. The molecules highly correlated with CACO2 cytotoxicity were checked and only those with high coefficients of variation and 95% confidence level limits not crossing zero were chosen.  13 , respectively, which are equivalent to the iridoid glycosides 10-O-foliamenthoylaucubin and nudifloside, respectively. These compounds were previously reported to have antitumor and antimicrobial properties 32,33 . Some molecules were detected previously by PCA analysis, i.e., those at m/z 592.268 and 620.299, which correspond to xanthone and limonoid, respectively. The rest of the detected molecules have not previously been annotated, and further study will be necessary to elucidate their structures. Figure 4B illustrates the intensity of the significant metabolites across Tabebuia species; those at m/z 503.346, 592.268, and 620.299 were the most intense among all of the highly correlated metabolites. In summary, metabolic study is a powerful tool that rapidly annotated the bioactive metabolites that may mediate in the demonstrated cytotoxicity of Tabebuia extracts against MCF-7, HepG2, and CACO2 cancer cell lines. The dereplication study relied on precise molecular formula prediction and chemotaxonomic filtration to minimize the number of hits per molecular formula, leading to the tentative annotation of the top hits 34 . Searching the literature for the bioactivities reported previously for these molecules proved they could be mediators of the exhibited cytotoxicity.

Material and methods
Collection  Cytotoxic activity testing. The cytotoxicity of the plant sample extracts was evaluated in cell lines using a 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) assay 35 . HepG2 (human hepatoma), MCF-7 (human breast adenocarcinoma), and CACO2 (human colon adenocarcinoma) cells were maintained in RPMI medium (Merck, Darmstadt, Germany) supplemented with 10% fetal bovine serum. MCF-7 cells were cultured at 37 °C and 5% (v/v) CO 2 in RPMI 1640 medium supplemented with 5% (v/v) fetal bovine serum, 1% (w/v) l-glutamine, 1% sodium pyruvate, and 0.4% (w/v) antibiotics (50 U/mL penicillin, 50 mg/mL streptomycin). Cells were obtained from the American Type Culture Collection (Rockville, MD, USA; HPACC, Salisbury, UK) and sub-cultured twice per week. All chemicals and reagents were purchased from Sigma Aldrich (Darmstadt, Germany). To normalize cell viability values, each plate included a triplicate of cells treated with the compound carrier dimethyl sulfoxide to define 100% viability as well as a triplicate of cells incubated with a cytotoxic mixture [200 ng/mL tumor necrosis factor, 200 ng/mL CD95L (Fas ligand), 200 ng/mL tumor necrosis factorrelated apoptosis-inducing ligand, 25 g/mL cycloheximide, 1% (w/v) sodium azide] to define maximal cell death and thus 0% viability. All other viability values were normalized according to the averages of these triplicates and analyzed using Graph Pad Prism 5 software (La Jolla, CA, USA). 5-Flurouracil was used as a positive control.
Liquid chromatography-high-resolution electrospray ionization mass spectrometry. One mg of each crude extract (of the ten samples under investigation) was weighted using sensitive electric balance (Sartorius, type 1712, Germany) and dissolved in 1 mL HPLC grade methanol then it was analyzed according to Abdelmohsen et al. 36 on an Acquity Ultra Performance Liquid Chromatography system coupled to a Synapt G2 HDMS quadrupole time-of-flight hybrid mass spectrometer (Waters, Milford, MA, USA). Chromatographic separation was performed on a BEH C18 column (2.1 × 100 mm, 1.7 µm particle size; Waters, Milford, MA, USA) with a guard column (2.1 × 5 mm, 1.7 µm particle size) and a linear binary solvent gradient of 0-100% eluent B over 6 min at a flow rate of 0.3 mL min -1 using 0.1% formic acid in water (v/v) as solvent A and acetonitrile as solvent B. All reagents were of analytical grade and were purchased (Fisher Scientific, Hemel Hempstead, UK). The injection volume was 2 µL and the column temperature was 40 °C. After chromatographic separation, the metabolites were detected by mass spectrometry using electrospray ionization in the positive mode; the source operated at 120 °C. The electrospray ionization capillary voltage was set to 0.8 kV, the sampling cone voltage was set to 25 V, and nitrogen was used as the desolvation gas (at 350 °C and a flow rate of 800 L h -1 ) and the cone gas (at a flow rate of 30 L h -1 ). The mass range for time-of-flight mass spectrometry was set to m/z (mass-tocharge ratio) 50-1200. Ms converter software was used in order to convert the raw data into divided positive and negative ionization files. Obtained files were then subjected to the data mining software MZmine 2.12 (https:// bmcbi oinfo rmati cs. biome dcent ral. com/ artic les/ 10. 1186/ 1471-2105-11-395) for deconvolution, peak picking, alignment, deisotoping, and formula prediction. For the combination of negative and positive ionization mode data files that were generated by MZmine, Excel macros were used. Both negative and positive ionization switch modes were used to include the highest number of metabolites from the investigated methanol extracts subjected to LC-HR-ESIMS analysis. The dereplication was achieved for each m/z ion peak with metabolites recorded in the customized databases based on established parameters (m/z threshold of ± 3 ppm and retention time), consequently, the number of the remaining unknown metabolites in each species was refine. The raw data was processed, aligned and merged into one dataset according to the method previously developed in our lab 20,22,37 . The molecular correlation analysis was created via specific application of the cytoscape software (version 3.4.0) as reported in our previous work 25 . The Expression Correlation app, implemented by Sander Group (Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York City), was used to compute a similarity network from either observation (active fractions) or their corresponding features (m/z) in data matrix. Similarity network is using the Pearson correlation coefficient to link the active fractions (observations correlation network) or their corresponding metabolites (features correlation network). A feature correlation network was created to explore which of the metabolites will be highly correlated with the bioactivity (represented by percentage of viability). The negative correlation threshold was set to 0.7 whereas the positive one was neglected. The network was mapped via organic y files layout, a kind of spring-embedded algorithm.

Conclusion
The metabolic profiling of five Tabebuia species, using LC-HRMS lead to annotation of 40 compounds belonging to ten different chemical classes. PCA was effectively employed to test the similarity and/or variation of the chemical profiles of the tested species. PCA results demonstrated similar chemical profiles of T. pulcherrima and T. pallida L., and a unique chemical profile of T. argentea Britton, T. guayacan (Seem.) Hemsl., and T. rosea (Bertol.) DC. The biological investigation results, indicated T. rosea (Bertol.) DC. stem extract as the most cytotoxic to HepG2 cancer cell line, whereas, T. pallida L. stem was the most active against MCF-7 cancer cell line. Furthermore, T. pulcherrima stem extract was the most cytotoxic to CACO2. The molecular correlation analysis highlighted the compounds responsible for cytotoxic activity against HepG2, MCF-7 and CACO2 cell lines. Interestingly, the molecules detected at m/z 565.357 and 360.127, directly linked to HepG2 activity and that detected at m/z 573.278 directly linked to MCF-7 activity, were not previously reported, suggesting that a new chemical structure still to be discovered.