Bioprospecting of Sechium spp. varieties for the selection of characters with pharmacological activity

Bioprospecting identifies new sources of compounds with actual or potential economic value that come from biodiversity. An analysis was performed regarding bioprospecting purposes in ten genotypes of Sechium spp., through a meta-analysis of 20 information sources considering different variables: five morphological, 19 biochemical, anti-proliferative activity of extracts on five malignant cell lines, and 188 polymorphic bands of amplified fragment length polymorphisms, were used in order to identify the most relevant variables for the design of genetic interbreeding. Significant relationships between morphological and biochemical characters and anti-proliferative activity in cell lines were obtained, with five principal components for principal component analysis (SAS/ETS); variables were identified with a statistical significance (< 0.7 and Pearson values ≥ 0.7), with 80.81% of the accumulation of genetic variation and 110 genetic bands. Thirty-nine (39) variables were recovered using NTSYSpc software where 30 showed a Pearson correlation (> 0.5) and nine variables (< 0.05), Finally, using a cladistics analysis approach highlighted 65 genetic bands, in addition to color of the fruit, presence of thorns, bitter flavor, piriform and oblong shape, and also content of chlorophylls a and b, presence of cucurbitacins, and the IC50 effect of chayote extracts on the four cell lines.


Results and discussion
The distribution of S. chinantlense, S. compositum and S. edule for morphological, biochemical, functional biological activity and genetic characterization variables, displaying a monophyletic tree where albus minor is placed as the closest to the basal state of the genotypes or taxa (Fig. 1). This differs from what was reported by Cadena-Iñiguez et al. 27 where the phylogenetic order places albus minor, albus dulcis and albus levis as the genotypes of greatest morphostructural evolution with regards to the wild relative, based on its adaptive specialization to the environment. In this study the fruit of yellow genotypes were those nearest to the root, because they show the lowest number of characters associated to the bioprospecting variables of proliferation, IC 50 , and the optimal concentration (extract doses applied to malignant cell lines with lowest percentage of proliferation). The Bootstrap/ Jackknife re-sampling methods showed a high degree of parsimony, with values from 100/100 as maximum to 57/64 as minimum, based on homoplasies, with length L = 566, Consistency Index (CI 59), and Retention Index (RI 51) 45 for apparently similar characters that result from independent evolution. The three genotypes from the albus group reflected a lower biological activity per lower number of characters associated (Table 1), except albus levis which showed a content of cucurbitacin P (CbP) as an autopomorphic character.
Starting from the fourth clade (nigrum levis, S. edule, virens levis, nigrum xalapesis), a greater association to the percentage of proliferation and IC 50 on cell lines and characters was observed, related to the piriform shape of the fruit, presence of cucurbitacins I (CbI), cucurbitacins D (CbD), dihydrocurcubitacin E (DHCbE), high content of chlorophylls a and b, soluble solids and total carotenoids, of synapomorphic origin [45][46][47] , which could evidence effects from their human manipulation, creating different life histories in the genotypes, fostering the variation of S. edule as an intraspecific complex (Tables 1 and 2). The ancestral characters of symplesiomorphic origin, such as the dark green color, total solids, ascorbic acid, chlorophylls a and b, and titratable acidity, are related to the percentage of proliferation, IC 50 , cell lines HeLa, P388, L929, highlighting that for S. chinantlense and S. compositum the characters of piriform shape of the fruit, polymorphic bands and percentage of proliferation and IC 50 , stand out in their association with the leukemia line WEHI-3, HeLa, J774 and L929 (Tables 3 and 4). S. chinantlense and S. compositum showed relevant morphological and biochemical characters in the clades, in contrast with S. edule and their varietal groups, possibly because the first two are species without human manipulation, and because, to the best of our knowledge, there is no reported evidence of their use as food or medicine, and they have not developed morphological variation as in the case of S. edule 48 . Mendoza et al. and Soler 49 note that the morphological and biochemical characters result from the genetic expression and regarding this, the same clade  www.nature.com/scientificreports/ shows that the polymorphic bands (between 42 and 227) for S. chinantlense and those (between 44 and 218) for S. compositum are ancestral states of speciation 49 . The characters located in the branches prior to the formation of a clade 47,50 , show that chlorophyll a, b, cucurbitacins E (CbE), cucurbitacins B (CbB), dihydrocurcubitacin D (DHCbD), dihydrocurcubitacin B (DHCbB), total solids, ascorbic acid and dark green to very dark green color and bitter flavor 3,13 are associated to variables of therapeutic interest 32,51 . The apomorphic characters are biological traits that are evolutionarily novel and are derived from the phylogenetically most proximal ancestral taxon 46,47 . The apomorphic characters (Table 4) present few characters and states of character associated to variables of functional biological activity, showing five morphological (thorns, bitter, piriform, width and length of the fruit), eight biochemical (higher concentrations of chlorophyll a and b, cucurbitacin P (CbP), isocucurbitacin E (ICbE), isocucurbitacin D (ICbD), DHCbE, light green color and dark green color of the fruit), and biological activity variables (such as percentages of proliferation in lines P-388, WEHI-3, and IC 50 in HeLa); meanwhile, the symplesiomorphic characters ( Table 2) that show a trait shared by two related taxa, when it coincides with the  www.nature.com/scientificreports/ character present in the common ancestors of both, showed a higher number within the cladogram, revealing an association with the antiproliferative effect on P-388, L-929, J-774, HeLa and WEHI-3, IC 50 , and more efficient dose, with different character states per genotype, such as the oblong shape and large size of the fruit, while those of biochemical origin showed diverse values of chlorophyll a and b, with influence on the variation of the fruit color from yellow to green with different intensities, ascorbic acid, total solids, titratable acidity, presence of CbB, DHCbB and DHCbD. Then the ancestral characters showed a higher variability identified by the cladistics approach as bioprospecting variables that determine their functional biological activity. Meta-analyses are a tool used in the area of health and natural products to synthesize published information regarding medication and adverse drug reactions 52,53 . They have recently been applied to the agricultural sector to facilitate the nutritional improvement of crops (chickpea and avocado) and have outlined varied responses to factors analyzed from independent studies, such as micronutrients and mycorrhizal symbiosis [54][55][56][57] . Other studies try to elucidate new agri-food applications such as soil amendment, water use, crop selection and natural resource optimization over time and across different regions 58,59 .
In this manner, our results offer an approach with greater statistical parsimony and relevance to identify outstanding variables in Sechium spp., based on accurate information, with traceable data and reproducible results, avoiding bias to acquire a scientifically valid view 57,60 .  www.nature.com/scientificreports/ Multivariate analysis. The multivariate analysis allowed identifying the variables that explain the higher total variability contained in the data and exploring the correlations and reducing the dimension of the analysis with new indexes. It was determined that with five principal components (PCs), the accumulated value of 80.81% of the variation was obtained (Tables 4, 5). The five PCs identified 110 polymorphic bands with higher statistical weight (not showed); however, in an environment of parent selection for genetic improvement, it is relevant to identify the bands associated in each component with bioprospecting variables (Table 5). The grouping methods seek the formation of groups of basic characterization units (BUCs) with characteristics to similarities or differences between pairs, whether through the matrix of indexes 61 . In this regard, the dendrogram (Fig. 2) shows four groups of genotypes, showing that S. chinantlense and S. compositum are separate due to their wild origin with regards to the S. edule genotypes, and showing a greater distance, equivalent to a lower similarity index. Although they share morphological and biochemical characters (color, bitter flavor, cucurbitacins) with S. edule, the latter was found to be far placed, sharing a higher number of characters with its domesticated variants. The edible chayote n. xalapensis, v. levis and n. spinosum stand out with green fruits, larger size and larger thorns in the latter, which are related with greater proximity with the two wild species because of their biological activity 16,24 , while the yellow fruit genotypes are those with the most dissimilarity and the least relevance for bioprospecting variables with 10-100% less content of cucurbitacins than green chayote variants and their wild ancestor (S. edule), respectively 15 , hence for the purpose of anti-proliferative activity they are not considered as bioprospectively relevant.
In Figs. 1 and 2 show a similar distribution for the Sechium spp. genotypes, where only the position of S. chinantlense was inverted with S. compositum; however, they do not change branch in the grouping, thus conserving the distances, and this also occurs in S. edule and nigrum levis. Compared to the multivariate analysis, the cladistics approach shows the relevant character and its state, which improves the identification of the variable associated to the functional biological activity (bioprospecting variables), which can help to discriminate or decide its importance in programs of basic research or genetic improvement. The cladistics analysis helps to identify phylogenetic relationships with the evolutionary approach of taxa 47 ; however, in this case, the cladogram identifies the possible bioprospecting characters, which strongly influence the relationship between taxa, showing that the wild genotypes present characters of higher statistical weight suggested as bioprospecting variables in comparison to S. edule and its domesticated genotypes, from whose group some stand out solely because of their higher biological efficiency, n. xalapensis, virens levis and n. spinosum, which are identified as the edible types with the highest level of domestication; meanwhile, a. minor, a. dulcis and a. levis, whose morphological, biochemical and genetic characters do not contribute significantly to the bioprospecting variables, are more related Table 5. Characteristic vectors of the analysis of 229 morphological, biochemical, genetic, and biological activity variables of fruits from ten Sechium spp. genotypes. The values in bold are those that were statistically the most significant because of their principal component.  (Fig. 2). This also highlights that they are more distant from S. chinantlense and S. compositum of bitter flavor, as well as from the groups of S. edule with green fruits, where the wild genotypes synthesize a higher amount of secondary metabolites [63][64][65] in comparison to the domesticated variants 66 . Among the terpenes there is a group of triterpenes known as cucurbitacins, closely related to the Cucurbitaceae family, and the bitter flavor of the wild types of Sechium is attributed to these. The important bioprospecting variables in the selection of parents for genetic crossing: fruits that are large, piriform, oblong of dark green color, with thorns and bitter flavor, or else those that present the highest levels of chlorophyll a and b, presence of cucurbitacins (ICbE, |1,234,567,890′sa, CbE, ICbD, CbB, DHCbB, DHCbD and CbQI), assuming a possible association with the corresponding polymorphic bands 63 . The analyses that determine distances have received criticism because they do not take into account the evolutionary processes, since several characters that are correlated do not evolve independently 67 , while the cladistics methods tend to analyze all the characters individually, contrary to the reduction of information to be explained through indexes of genetic distances 68 . It is important to highlight that many times the same pattern can be explained through different population histories 46 , which suggests delving into aspects that have a relationship with the number of genes that determine a character. This suggests a continuous dynamic interaction of adaptation with the factors in which the population grows, and each species adjusts the information contained in the genome according to its needs for survival; therefore, the management of descriptors, characteristics or measurable attributes is important, in order to record and evaluate the references of shape, structure or behavior of a genotype, because these are of interest for plant breeders and agronomists 69 . Our results suggest that even when the genotypes are quite near taxonomically, particularly in the S. edule complex, there are notable differences in their biochemical components, evidencing a diversity of metabolites that can be used in applications other than food, for example for therapeutic or pharmacological uses.
In a pharmacological study, Lira 2 revealed the diuretic properties of the seeds, as well as cardiovascular and anti-inflammatory of leaves and fruits. Also been observed that extracts from S. edule have activity against grampositive bacteria 70 , while during a vitro study 11 proved their antioxidant activity. Other studies carried out on Wistar rats, demonstrated that S. edule extracts have hypotensive activity 71 , in addition to the ability to alter the marking of blood elements with the technetium-99 radionuclide, to modify the morphology of erythrocytes, promote the fixation of radioactivity in blood proteins and the biodistribution of pharmaceutic radio sodium pertechnetate. Other studies have shown that these extracts induce damage to the DNA molecule 28,33 , reduce the levels of glucose, globulin, and diastolic blood pressure.
In sum, it is suggested that the biological effects shown can be because chayote presents in its composition compounds that act as active metabolites in vivo with antioxidant properties [72][73][74][75][76] . Phytochemical studies in S. edule have identified non-phenolic alkaloids, saponins, sterols, triterpenes 4 and eight glycosylate flavonoids 12 , whose pattern of compounds is shared by taxonomically related species 62 and many of them are associated to anti-tumor activity 77 . Studies allowed isolating and characterizing, from the chayote seed, the sechiumine protein, a molecule with the property of deactivating the ribosomal function in the cervical cancer line HeLa, and it was located as a possible chemotherapeutic agent 78 . This is quite relevant and contributes widely to the search for alternatives from natural sources for the treatment of public interest illnesses; however, in 99% of the pharmacological and phytochemical studies that reported for S. edule, the biological type or varietal group evaluated are not described, presenting difficulties for experiment reproducibility, and risks regarding its pharmacological www.nature.com/scientificreports/ application in humans. It is relevant to note the importance of a taxonomic discipline to classify the genotypes of an intraspecific complex, as is the case of S. edule, since it facilitates its identification through secondary characters.

Conclusions
The meta-analysis carried out, identified as principal bioprospecting variables the dark green color, bitter flavor, presence of thorns, oblong and piriform shape of the fruits. Also, the presence of eight cucurbitacins (ICbE, DHCbE, CbE, ICbD, CbB, DHCbB, DHCbD and CbQI), and the highest values of chlorophyll a, and b. The results from polymorphic bands showed 49 for the apomorphic characters and 38 for the symplesiomorphic in the genotypes evaluated. It is very important to consider the taxonomic identification (inter and intraspecific) of biological variants, mainly for S. edule genotypes, since they present large morphobiochemical, genetic and functional biological activity differences, which can be used for various investigations.

Experimental section
General procedures. The meta-analysis contributes to a systematic, objective, and scientific method for the quantitative revision of primary studies with a common theme 79,80 , considering a rigorous synthesis of the best possible evidence 81 . For this analysis, the information concerning Sechium was catalogued under the initial criterion of identification of the species and genotype employed (varietal). The search was performed in the databases of Google Scholar, CAB Abstracts, Agris, Web of Science, Biological Abstracts, Microsoft Academy, and Scopus with the keywords S. edule, S. edule secondary metabolism, S. chinantlense, S. compositum, chayote, varieties, apoptosis, anti-proliferative, characterization, cucurbitacins, DNA extraction, peroxidases, phenols, flavonoids, and chromatography. From these, 7427 results were identified, and when applying the criteria of species and varietal group identified in each publication, the sample was reduced to 20 (Table 6). With this, a database made up of five morphological variables of fruits (flavor, color, shape, size and thorns) was elaborated, as well as 19 biochemical variables (chlorophyll a, b, presence of 13 cucurbitacins, total carotenoids, total solids, titratable acidity and ascorbic acid), three determinations for the biological evaluation of plant extracts (percentage of proliferation, IC 50 (Table 7).
Selection criteria. The systematic reviews involved in the meta-analysis were directed toward the use of information already disclosed, to be reanalyzed with new approaches and perspectives for research 58,59 . An important aspect in selecting criteria was that the studies required complete information, with traceable data and reproducible results, to reduce or avoid biases in research 57,60 . The publications that address the fruits of S. edule, S. chinantlense and S. compositum as a central theme were considered, excluding studies that did not specify the genotype used, which shared evaluation studies about biological activity with very similar methods, and which considered variables that have allowed some type of varietal classification related to the subject in previous reports ( Table 6).
The analysis was developed with two approaches. The first was through a cladistics analysis because it incorporates the Popper critical rationalism approach through the refutation of phylogenic hypotheses examined under a principle of parsimony 82,83 ; and through non-parametric statistics and using the WinClada version 1.00.08 84,85 (free license) software with the Bootstrap/Jackknife re-sampling methods, addressing the genotypes as a population, through a random simulation 86 , performing a random elimination of variables until generating a parsimonious cladogram 67 , to define the stability of the clades and identify the status of the outstanding character(s). The analysis was repeated 1000 times creating a percentage that was used as an index of support, consistency, or confidence in the cladograms 87 .    www.nature.com/scientificreports/