Introduction

Cordyceps sinensis is a rare medicinal fungus in China and is listed as a second-class national key protected wild plant1,2. Cordyceps sinensis has the therapeutic effect of regulating the human immune system and its antitumor and antioxidation activities, and it can improve cachexia and prolong life. It is worth mentioning that in recent years, as C. sinensis has become a favoured functional food, its market demand has increased yearly. Due to its harsh habitat and excessive digging, wild C. sinensis resources are in short supply, and the IUCN red list status is vulnerable3,4. Therefore, finding new resources for medicinal material is a fast and economical way to meet the current demonstrated needs, thereby effectively compensating for the scarce resources of this rare Chinese traditional medicine. More than 350 species of Cordyceps have been reported worldwide. Of these species, only a few, such as Cordyceps sinensis (Berk.) Sacc., Cordyceps sobolifera (Hill.) Berk., Cordyceps ophioglossides (Ehr.) Link. and Cordyceps militaris (Vuill.) Fr., are used for tonics and therapeutic drugs5. The National Compilation of Chinese Herbal Medicine states: "Cordyceps militaris fruit body and insect body can also be used as C. sinensis medicine"6. In 1983, the asexual form of Cordyceps gunnii was first reported in China as a new species of Paecilomyces, and the mycelia of Paecilomyces gunnii had UV absorption spectra similar to those of C. sinensis7. Cordyceps cicadae is a kind of fungal TCM that is included in the Traditional Chinese Medicines in Zhejiang Province of Processing Chinese Crud Drugs8. There is sufficient evidence that the nucleosides and some biological activities of C. cicadae are similar to those of C. sinensis9,10,11,12. The medicinal values and health care effects of C. militaris, C. cicadae and C. gunnii indicate that they are expected to become substitutes for C. sinensis13. In addition, the cultivation and development conditions of C. militaris, C. cicadae and C. gunnii are not as harsh as those of C. sinensis, so they can be used as ideal new drug sources for the research and development of C. sinensis.

At present, there are many studies on the biologically active components of C. sinensis. Some metabolites (including cordycepin, cordycepic acid, several other nucleosides, cyclic peptides, sterols and polysaccharides) isolated from Cordyceps and the pharmacological properties of some species have been verified in vitro and in vivo14,15,16,17, demonstrating the phytochemical diversity of Cordyceps and the strong biological activity of these components, which can be used as possible clues for drug discovery18,19. However, in addition to studies in the chemical and pharmacological fields of TCM resources, the habitat of Cordyceps species has undergone profound changes under global warming and anthropogenic pressure, leading to a drastic decline in its wild population20,21. We found that there are limited studies to predict the future geographical distribution of Cordyceps species and their correlation with chemical composition. In this study, the metabolic profiles of C. sinensis, C. militaris, C. cicadae and C. gunnii were comprehensively and directly analysed by extensive targeted metabolomics. The ecological environment, which determines the main factors affecting the quality and source of Chinese medicinal materials, has always been the focus of research. The synthesis and accumulation of the effective components of TCM are closely related to the ecological environment22. As one of the ecological environmental factors, climatic conditions have diverse and complex effects on the quality of TCM22,23,24. MaxEnt model is a prediction scheme of species geographical distribution which integrates niche theory, occurrence data model and maximum entropy principle. The data selected by maxent include: (1) location data of specific species (latitude and longitude, the presence or absence of species); (2) the necessary environmental conditions for the survival of the species at the same georeferential point. The environmental layers commonly used for modeling are temperature and rainfall. After the simulation is generated, the resulting probability function is applied to the interesting geographic area to form the output file containing the occurrence probability of the species. MaxEnt is a statistical method based on known species distribution information and environmental data that is used to predict unknown distributions25. Compared with genetic algorithm for ruleset production (GARP), random forest (RF) and other models, MaxEnt has a relatively simple modeling process, a high tolerance for sample quantity and quality, and the prediction results are excellent25. It is one of the most popular models in recent years in the field of predicting species-suitable areas26,27,28. Of course, the MaxEnt model has some limitations, such as besides climate variables, other factors including soil, topography, biotic factors and human activity also affect the content of active compounds in herbs and are key to the formation of metabolites in medicinal plants. Further research on incorporating these factors into models, would further enhance predictions about likely changes in the distribution of Cordyceps in China as a result of changing climate conditions. Based on the nonrandom relationship between climate factors of species, spatial distribution data and the study area, under certain limiting conditions, the maximum probability distribution was found to be the optimal distribution, the suitable area of species was predicted, and the spatial distribution model of its geographical scale was constructed28,29,30,31. Previous studies have combined species geographical distribution prediction and chemical composition analysis32,33,34, for example, Wan et al. established a new method to evaluate the impact of environmental factors on the quality of Codonopsis pilosula based on ultra-high performance chromatographic (UPLC) fingerprint technology and MaxEnt model32. Li et al. used maxent model and chemical analysis to model the distribution of potential species of Coptis herbs as a function of environmental variables and altitude33. Sun et al. studied the quality ecotypes of Panax quinquefolium based on genetic, chemical and ecological characteristics34. Based on the above examples, we believe that learning based on MaxEnt model and metabolome technology is necessary and effective, and these studies provide reasonable basis for habitat suitability assessment and resource conservation of endangered medicinal plants. Inspired by previous researches and the existing scientific research foundation of our team, this study makes the following research hypotheses: (1) The metabolites of Cordyceps may be affected by climatic conditions, and the correlation between metabolites and climatic factors should be further explored; (2) Based on climate factors and metabolites, we sought alternative species with high correlation with C. sinensis in Cordyceps genus. In order to expand the medicinal resources and provide a reasonable protective strategy for C. sinensis.

In conclusion, this study used widely targeted metabolomics and the MaxEnt model as technical means and nucleoside metabolites and climate factors as evaluation indices, combined with multivariate statistical analysis, to evaluate the species correlation between C. sinensis and C. militaris, C. cicadae, and C. gunnii.

Results

Principal component analysis (PCA)

The results of PCA showed that (Fig. 1) the contribution rate of PC1 was 65% and that of PC2 was 20.1%. The cumulative contribution rate of PC1 and PC2 reached 85.1%. According to the dispersion analysis of sample repeatability and difference, the stability of the instrument and the validity of the data were determined. If the sample dispersion is small, the instrument is stable, and the test result is reliable. As shown in Fig. 2, the four sample groups are obviously separated, and the inserted mixed QC samples overlap well, indicating that the instrument has good stability. Moreover, the small dispersion of biological repeat samples within the sample group indicates that the biological repeat samples have good repeatability and representativeness, indicating feasibility for the accuracy of subsequent metabolite analysis results.

Figure 1
figure 1

PCA results of C. sinensis, C. cicadae, C. militaris and C. gunnii samples and quality control.

Figure 2
figure 2

Species distribution point and sample photographs of C. sinensis, C. cicadae, C. militaris and C. gunnii. The figure was accomplished by ArcGIS (version 10.7, https://www.esri.com/zh-cn/arcgis/) and MaxEnt software (version 3.4.1, https://biodiversityinformatics.amnh.org/open_source/maxent/).

Orthogonal partial least squares discriminant analysis (OPLS-DA)

In contrast to principal component analysis (PCA) and orthogonal partial least-squares discriminant analysis (OPLS-DA), which is another kind of supervised learning method35, this method can successfully separate samples and is more conducive to finding different metabolites. The OPLS-DA model score figures (Fig. 3A–C) of C. sinensis and C. cicadae, C. militaris, and C. gunnii found that groups of samples can be gathered into a class. The difference between the two groups was obvious, indicating that the experiment had good repeatability and that the results were similar to those of PCA.

Figure 3
figure 3

OPLS-DA model score diagram and model verification diagram of C. sinensis, C. cicadae, C. militaris and C. gunnii samples. (A) Score diagram of OPLS-DA model between C. sinensis and C. cicadae. (B) Score diagram of OPLS-DA model between C. sinensis and C. militaris. (C) Score diagram of OPLS-DA model between C. sinensis and C. gunnii. (D) OPLS-DA model verification diagram between C. sinensis and C. cicadae. (E) OPLS-DA model verification diagram between C. sinensis and C. militaris. (F) OPLS-DA model verification diagram between C. sinensis and C. gunnii.

In addition, R2X = 0.974, R2Y = 1, and Q2Y = 1 for C. sinensis and C. cicadae; R2X = 0.982, R2Y = 1, and Q2Y = 1 for C. sinensis and C. militaris; R2X = 0.975 and R2Y = 1 for C. sinensis and C. gunnii; and Q2Y = 1. These three indicators are close to (or equal to) 1, indicating that the more stable and reliable the OPLS-DA model is, the better it can predict the different metabolites of C. sinensis and C. cicadae, C. militaris, and C. gunnii. To prevent overfitting of the OPLS-DA model, we adopted the substitution test method to verify the OPLS-DA model. From Fig. 3D–F, the OPLS-DA model established by the metabolite data in this experiment was not fitted. In our study, variable important in projection (VIP) was used to measure the influence strength and explanatory ability of metabolite accumulation differences on the classification and discrimination of each group of samples. VIP ≥ 1 was a common screening standard for differential metabolites, and the larger the VIP value was, the greater the contribution to the classification of the OPLS-DA model, and the greater the difference in the content of this compound between the two groups. A total of 852 different metabolites were preliminarily detected in the three groups, among which 294 different metabolites were detected between C. sinensis and C. cicadae, 188 of which were significantly upregulated and 106 of which were significantly downregulated. A total of 284 different metabolites were detected between C. sinensis and C. militaris. A total of 173 metabolites were significantly upregulated, and 111 metabolites were significantly downregulated. A total of 274 different metabolites were detected between C. sinensis and C. gunnii, 142 of which were significantly upregulated and 132 of which were significantly downregulated.

Comparative analysis of metabolites

Sixteen types of metabolites were detected in Cordyceps fungi based on the LC-QTOF-MS method, among which 1345 metabolites were identified in C. sinensis (Fig. 4A,B). The types of compounds in the top three included 222 amino acids and their metabolites (16%), 128 organic acids and their derivatives (9%) and 89 nucleotides and their metabolites (7%). A total of 1476 metabolites were identified from C. cicadae. The types of compounds in the top three included 238 kinds of amino acids and their metabolites (16%), 132 kinds of organic acids and their derivatives (9%) and 95 kinds of nucleotides and their metabolites (6%). A total of 1321 metabolites were identified from C. militaris. The types of compounds in the top three included 236 kinds of amino acids and their metabolites (18%), 113 kinds of organic acids and their derivatives (9%) and 99 kinds of nucleotides and their metabolites (8%). A total of 1390 metabolites were identified from C. gunnii. The types of compounds in the top three included 228 kinds of amino acids and their metabolites (16%), 124 kinds of organic acids and their derivatives (9%) and 90 kinds of nucleotides and their metabolites (7%). Compared with C. sinensis and C. cicadae, C. militaris, and C. gunnii, the similarity of the metabolic spectrum was 94.42%, 80.82%, and 91.00%, respectively.

Figure 4
figure 4

Analysis of total metabolites in C. sinensis, C. cicadae, C. militaris and C. gunnii. (A) UpSet venn diagram for the total difference of metabolites between C. sinensis, C. cicadae, C. militaris and C. gunnii. (B) Pie chart of grouping proportion of total metabolites in the samples of C. sinensis (a), C. cicadae (b), C. militaris (c), C. gunnii (d). (C) Cluster heat map of total metabolites in the samples of C. sinensis, C. cicadae, C. militaris and C. gunnii. (D) Volcanic map of total different metabolites between C. sinensis and C. cicadae (a), C. militaris (b), C. gunnii (c).

The p value or fold change of univariate analysis was used to further screen out the differential metabolites in combination with univariate statistical analysis. Generally, fold change ≥ 2 and fold change ≤ 0.5 or p value < 0.05 were selected, and the above two were used as screening criteria (Fig. 4D). A total of 204 metabolites were detected, among which 68 different metabolites were detected between C. sinensis and C. cicadae; 47 metabolites were significantly upregulated, and 21 metabolites were significantly downregulated. A total of 66 different metabolites were detected between C. sinensis and C. militaris, with 43 metabolites significantly upregulated and 23 metabolites significantly downregulated. A total of 70 different metabolites were detected between the two groups of C. sinensis and C. gunnii, of which 50 metabolites were significantly upregulated and 20 were significantly downregulated.

To more clearly see the differences in compounds in C. sinensis, C. cicadae, C. militaris and C. gunnii, the range method was used to carry out the normal processing of the metabolite content data of the four species of Cordyceps, for which R software (http://www.r-project.org/) was used. Hierarchical cluster analysis (HCA) was conducted for metabolite accumulation patterns among different samples. As shown in Fig. 4C, each rectangle in the heatmap represents a metabolite whose content is coloured according to a standardized scale of − 2.0 (low) to 2.0 (high), with red representing higher than average compound relative content and blue representing lower than average compound relative content. The tree shows the presence of different subclusters, including different numbers of metabolites with varying degrees of similarity, as well as biochemically related compounds commonly found clustered together; the compounds are represented horizontally, and the four species of cordyceps, vertically.

Pearson correlation analysis was performed for all metabolites detected by metabolomics techniques (Fig. 5C). The results of the heatmap showed that a p value < 0.05 for C. sinensis, C. cicadae, C. militaris and C. gunnii within and between groups, which represented the significant correlation between samples within and between groups. The correlation coefficient (r) was used to further evaluate the degree of correlation. The closer it was to 1, the stronger the correlation. The correlation coefficient between C. sinensis and C. gunnii ranged from 0.8 to 1.0, showing a strong positive correlation. The correlation coefficients between C. sinensis and C. cicadae, C. gunnii and C. cicadae were between 0.6 and 0.8, also showing a strong positive correlation. The correlation coefficients of C. sinensis and C. militaris, C. militaris and C. cicadae, C. militaris and C. gunnii were in the range of 0.4–0.6, showing a moderate positive correlation.

Figure 5
figure 5

Analysis of nucleoside metabolites of Cordyceps. (A) venn diagram of nucleoside metabolites of C. sinensis, C. cicadae, C. militaris and C. gunnii. (B) Histogram of the relative contents of major components in nucleoside metabolites of C. sinensis, C. cicadae, C. militaris and C. gunnii. (C) Pearson correlation heat map of all metabolites of C. sinensis, C. cicadae, C. militaris and C. gunnii. (D) Pearson correlation heat map of nucleoside metabolites of C. sinensis, C. cicadae, C. militaris and C. gunnii.

Many bioactivity studies reported for the species belonging to the Cordyceps genus have been associated with the presence of nucleosides. The majority of the nucleosides reported from Cordyceps species were isolated from C. sinensis, C. militaris and C. cicadae. These nucleosides display potent anticancer, antiviral, neuroprotective, anti-inflammatory, antitumor and antioxidant activities36. By comparing the number of nucleoside metabolites among C. sinensis, C. cicadae, C. militaris and C. gunnii (Fig. 5A), it can be seen that there are 84 species of nucleoside metabolites in C. sinensis, C. cicadae, C. militaris and C. gunnii, and there is 1 specific nucleoside compound in both C. militaris and C. gunnii. Pearson correlation analysis results of all nucleosides (Fig. 5D) were ranked from strong to weak, and the correlation coefficients between various groups, C. gunnii and C. cicadae, C. sinensis and C. gunnii, C. sinensis and C. cicadae, and C. militaris and C. cicadae, were in the range of 0.8–1.0, showing a strong positive correlation. The correlation coefficients between C. sinensis and C. militaris and between C. militaris and C. gunnii were between 0.6 and 0.8, showing a strong positive correlation. According to the pearson correlation heatmap (Supplementary Fig. S2) analysis of the 9 principal components of nucleoside metabolites, there was a p value < 0.05 for C. sinensis, C. cicadae, C. militaris, C. gunnii groups, C. gunnii and C. cicadae, and the correlation coefficient (r) was > 0.99, indicating that the correlation among all groups was significant, strong and positive. The correlation between the other sample groups was in the order of strong to weak: C. sinensis and C. cicadae, C. sinensis and C. gunnii, C. militaris and C. gunnii, C. militaris and C. cicadae, C. sinensis and C. militaris. As shown in Fig. 6C, the relative contents of the 9 main components of nucleoside metabolites of C. sinensis, C. cicadae, C. militaris and C. gunnii decreased in the order of adenosine > guanosine > inosine > hypoxanthine > thymine > 3′-deoxyadenosine > uracil > thymidine > adenine from large to small. This result is consistent with the conclusion that adenosine is the effective, determining component in the content of C. sinensis in the “Chinese pharmacopoeia”. As shown in Table 1, the nucleoside compounds detected in different species of the same cordyceps were different, and their relative content was also different. The relative contents of nucleoside compounds in C. sinensis, C. cicadae, C. militaris and C. gunnii were 2.22E+09 ± 1.60E+08, 2.12E+09 ± 1.08E+08, 2.80E+09 ± 8.99E+07 and 1.82E+09 ± 6.95E+07, respectively. However, the relative content of adenosine (Fig. 5B) was in the order of C. sinensis > C. gunnii > C. cicadae > C. militaris. The results showed that both C. gunnii and C. cicadae demonstrated a strong correlation with C. sinensis in total metabolites and nucleoside compounds. Cordyceps gunnii and C. cicadae are expected to be medicinal substitutes for C. sinensis to develop new medicinal sources and alleviate the problem of C. sinensis resource depletion.

Figure 6
figure 6

(A) Analyze the interspecific correlation heat map of C. sinensis, C. cicadae, C. militaris and C. gunnii according to ecological climate factors. (B) The results of the Platts analysis of ecoclimate factors and nucleoside metabolites. (type1: ecoclimate factors, type2: nucleoside metabolites.) (C) Network diagram of correlations between eco-climate factors and nucleoside metabolites. (D) Correlation network diagram of eco-climate factors and major components in nucleoside metabolites.

Table 1 Relative contents of nucleosides in C. sinensis, C. cicadae, C. militaris and C. gunnii were determined by LC-QTOF-MS/MS.

Ecological niche modelling of Cordyceps species

The optimized parameters were used to simulate the suitable growth area of C. sinensis, C. cicadae, C. militaris and C. gunnii. The optimized parameters of C. sinensis were FC = LQ and RM = 0.1. The optimized parameters of C. cicadae were FC = QP and RM = 0.7; the optimized parameters of C. militaris were FC = LQ and RM = 0.2. The optimized parameters of C. gunnii were FC = P and RM = 2.3. When the optimization parameters (FC and RM values) were used to set the model, the delta AICc values were all 0 (Supplementary Table 1S). The weight of each ecological factor in the habitat suitability was analysed by the jackknife method in the MaxEnt model. The percentage contribution values of each bioclimatic variable to C. sinensis, C. cicadae, C. militaris and C. gunnii are shown in Table 2. All values are averages of 10 repetitions. The top three contributing variables of C. sinensis were the mean temperature of warmest quarter (bio10), the temperature seasonality (standard deviation × 100) (bio4) and the annual mean temperature (bio1), and the relative contribution degrees were 27.86%, 26.33% and 19.39%, respectively. The top three contributioning variables of C. cicadae were the min temperature of coldest month (bio6), precipitation of coldest quarter (bio19) and isothermality (bio3), with relative contributions of 75.09%, 4.74% and 4.71%, respectively. The top three contributions of C. militaris climate variables were annual precipitation (bio12), temperature seasonality (standard deviation × 100) (bio4), and precipitation of warmest quarter (bio18), and their relative contributions were 44.51%, 13.23% and 12.09%, respectively. The top three contributions of C. gunnii climate variables were the min temperature of coldest month (bio6), temperature seasonality (standard deviation × 100) (bio4) and the mean temperature of coldest quarter (bio11), with relative contribution rates of 33.37%, 30.77% and 19.21%, respectively.

Table 2 Ecological climatic variables and niche model contribution of C.sinensis, C.cicadae, C.militaris and C.gunnii.

The variable response curve shows how each environmental variable affects the MaxEnt prediction, indicating how it changes with each environmental variable. According to the response curve of environmental variables to the presence probability in the MaxEnt model (Supplementary Fig. S5), and using a presence probability greater than 0.5 as the selection condition of suitable area for Cordyceps, the threshold values of the dominant environmental variables affecting the distribution of suitable area for Cordyceps are as follows. The ranges of bio10, bio4 and bio1 are 7.42–13.24 °C, 625.29–802.65 °C and 2.90–3.82 °C, respectively, among the top three climate variables contributing to C. sinensis. The range of bio6, bio19 and bio3 is between − 3.86 and 3.86 °C, greater than 130.90 mm, and greater than 34.07, among the top three climate variables contributing to C. cicadae. The ranges of bio12, bio18 and bio4 are greater than − 1250.23 mm, 496.16 mm and less than 767.83, respectively, among the top three climate variables contributing to C. militaris. The ranges of bio6, bio4 and bio11 in the top three climate variables of C. gunnii contribution is greater than − 1.25 °C, less than 782.79 and greater than 4.19 °C, respectively.

The ROC curve of MaxEnt of cordyceps is shown in Fig. 7. The average AUC values of C. sinensis, C. cicadae, C. militaris and C. gunnii are 0.957, 0.948, 0.972 and 0.921, respectively. The results showed that the model had high reliability and accuracy in predicting the habitat suitability of Cordyceps.

Figure 7
figure 7

ROC curve of MaxEnt model of Cordyceps. (A) ROC curve of MaxEnt model of C. sinensis. (B) ROC curve of MaxEnt model of C. cicadae. (C) ROC curve of MaxEnt model of C. militaris. (D) ROC curve of MaxEnt model of C. gunnii.

Figure 8A–D shows the potential distribution of C. sinensis, C. cicadae, C. militaris and C. gunnii. The MaxEnt models based on current climate conditions and the logistic output (LO) results generated using MaxEnt software are expressed as probabilities ranging from 0 to 1. Using the reclassification tool of ArcMap 10.7, the simulation results were divided into four levels, where 0–0.2 was considered unsuitable, 0.2–0.45 was considered low, 0.45–0.7 was considered medium, and 0.7–1 was considered high.

Figure 8
figure 8

(A) Current situation and potential prediction of spatial suitability distribution of C. sinensis in China. (a) Histogram of urban area proportion in high (Class1) and medium (Class2) suitable distribution areas of C. sinensis in China at present. (b) Current map of suitable distribution areas of C. sinensis in China. (c–f) Map of suitable distribution areas of C. sinensis in China from 2021 to 2040, 2041 to 2060, 2061 to 2080 and 2081 to 2100. (B) Current situation and potential prediction of spatial suitability distribution of C. cicadae in China. (a) Histogram of urban area proportion in high (Class1) and medium (Class2) suitable distribution areas of C. cicadae in China at present. (b) Current map of suitable distribution areas of C. cicadae in China. (c–f) Map of suitable distribution areas of C. cicadae in China from 2021 to 2040, 2041 to 2060, 2061 to 2080 and 2081 to 2100. (C) Current situation and potential prediction of spatial suitability distribution of C. militaris in China. (a) Histogram of urban area proportion in high (Class1) and medium (Class2) suitable distribution areas of C. militaris in China at present. (b) Current map of suitable distribution areas of C. militaris in China. (c–f) Map of suitable distribution areas of C. militaris in China from 2021 to 2040, 2041 to 2060, 2061 to 2080 and 2081 to 2100. (D) Current situation and potential prediction of spatial suitability distribution of C. gunnii in China. (a) Histogram of urban area proportion in high (Class1) and medium (Class2) suitable distribution areas of C. gunnii in China at present. (b) Current map of suitable distribution areas of C. gunnii in China. (c–f) Map of suitable distribution areas of C. gunnii in China from 2021 to 2040, 2041 to 2060, 2061 to 2080 and 2081 to 2100. The figure was accomplished by ArcGIS (version 10.7, https://www.esri.com/zh-cn/arcgis/) and MaxEnt software (version 3.4.1, https://biodiversityinformatics.amnh.org/open_source/maxent/).

At present, China’s cities with high and medium suitability for C. sinensis (Fig. 8A) are mainly distributed in Sichuan, Tibet, Qinghai, Gansu, Yunnan, covering an area of 327,218 km2, accounting for 88.94% of the total area of C. sinensis. Cities with high and medium suitability (Fig. 8B) for C. cicadae are mainly distributed in Yunnan, Jiangxi, Xizang, Fujian and Guizhou, covering an area of 274,115 km2, accounting for 74.51% of the total area of C. cicadae. Cities with high and medium suitability for C. militaris (Fig. 8C) are mainly distributed in Guangxi, Xizang, Yunnan, Taiwan, Guangdong, covering an area of 360,226 km2, accounting for 97.91% of the total area of C. militaris. Cities with high and medium suitability for C. gunnii (Fig. 8D) are mainly distributed in Guizhou, Sichuan, Guangxi, Anhui, Hubei, covering an area of 238,971 km2, accounting for 64.96% of the total area of C. gunnii. According to the future prediction, the suitability distribution area of C. cicadae showed a significant decline. However, Cordyceps sinensis showed a significant increase in its suitability distribution region during 2021–2040 and a small increase in its suitability region during 2041–2060, 2061–2080 and 2080–2100, at which time it was basically stable. In the three periods of 2021–2040, 2041–2060 and 2061–2080, the suitable area of C. militaris increased, and in the period of 2080–2100, the area of suitable area decreased sharply. From 2021 to 2040, the suitable area of C. gunnii was in the same state as that of C. sinensis: the area initially increased, then decreased from 2041 to 2060, and finally slowly increased from 2041 to 2100. In addition, Table 3 shows that in the four future periods (2021–2040, 2041–2060, 2061–2080, 2081–2100), the centroid shifts distances of C. sinensis are 141.32, 119.06, 135.63 and 123.91 km, the migration distances of C. militaris are 367.43, 371.14, 343.71 and 376.60 km, the migration distances of C. cicadae are 76.52, 78.22, 157.38 and 80.15 km, and the migration distances of C. gunnii are 75.84, 77.28, 69.61 and 77.73 km, respectively. Cordyceps sinensis, C. cicadae, C. militaris and C. gunnii have the same direction of centroid shifts and shifted to the northwest (Supplementary Fig. 6S).

Table 3 Four future periods (2021–2040, 2041–2060, 2061–2080, 2081–2100) change of centroid migration in suitable areas of C. sinensis, C. cicadae, C. militaris and C. gunnii.

In conclusion, under future climate conditions, the area of suitable areas for C. cicadae will gradually decrease and will generally increase compared with the area of suitable areas for C. sinensis, C. gunnii, C. militaris. The suitable areas for C. sinensis, C. cicadae, C. militaris and C. gunnii all migrated to the northwest. According to the biological variables of 13 kinds of climatic factors, pearson correlation analysis showed (Fig. 6A) that there were strong positive correlations between various groups; for C. cicadae and C. militaris, there was a correlation coefficient between 0.8 and 1.0. In addition, there was a strong positive correlation between C. sinensis and C. cicadae and between C. sinensis and C. gunnii. There was a weak positive correlation between C. gunnii and C. militaris and between C. militaris and C. sinensis; the correlation coefficient was between 0.2 and 0.4. There was a very weak positive correlation between C. sinensis and C. militaris, with a correlation coefficient between 0.0 and 0.2. The results indicated that the bioclimatic environments of C. sinensis, C. gunnii and C. cicadae were similar.

Correlation analysis of climate factors and metabolites

Procrustes is used to analyse the correlation between two groups of data37. It is a method for comparing the consistency of two groups of data. In Fig. 6B, different colours represent different groups. The points mapped on the principal orthogonal axis are the quadrate points from the climate factor PCA, represented by solid circles. The points mapped on the oblique orthogonal axis are the quadrate points from the nucleoside compound PCA, represented by triangles. The line indicates the two matched samples. The length of the line segment is the residual value between the two samples. The value of M2 is 0.096, which is the sum of squares of residual values. The smaller the value of M2 is, the better the consistency between the climate factors of C. sinensis, C. cicadae, C. militaris, C. gunnii and the data of nucleoside compounds. The significance p value of M2 calculated by the substitution test is 0.001.

The correlation coefficient and p value between the biomass and the relative content of all metabolites of 13 climate factors were calculated; the threshold condition of the correlation range between 0 and 1, significant correlation coefficient of 0.8 and significant correlation p value of 0.01 were selected to decide the results of significant correlation. Cytoscape was used to map the network. As shown in Fig. 8C, 13 climate factors were significantly correlated with 75 nucleoside metabolites, including 30 nucleoside compounds related to bio5, 27 nucleoside compounds related to bio4, 24 nucleoside compounds related to bio10, and 19 nucleoside compounds related to bio1. There were 18 nucleosides related to bio12, bio16 and bio18; 16 to bio3; 15 to bio6; 14 to bio19; and 13 to bio14. There were 8 nucleoside compounds related to bio11 and 3 nucleoside compounds related to bio8. As shown in Supplementary Fig. 3S, different compounds have different correlations with different climate factors. 3′-Deoxyadenosine (cordycepin, MEDP1955), 5′-deoxyadenosine (MEDP1018) and 2-deoxyinosine (MEDP0150) were fully correlated with bio5 (r = 1). The main nucleoside compounds uracil (MEDN0169) and bio5, bio4, 3′-deoxyadenosine (cordycepin, MEDP1955) and bio4, thymine (MEDP0177) and bio5, bio4, adenosine (MEDP0160) and bio4, bio5, Inosine (MEDN1003) were significantly correlated with bio6, and uracil (MEDN0169) was significantly correlated with bio4. The results showed that bio4 and bio5 were the most influential climatic factors on nucleoside compounds.

Pearson analysis of 9 principal components of nucleoside compounds and 13 climate factors showed that the correlation heatmap (Supplementary Fig. 4S) was significantly correlated with climate factors (p value < 0.05), which were adenine, thymidine, uracil, inosine, adenosine, thymine and 3′-deoxyadenosine. Combined with correlation coefficient > 0.6 were bio4 and uracil, adenosine, thymine, 3′-deoxyadenosine; bio5 and uracil, adenosine, thymine, 3′-deoxyadenosine; bio6 and inosine; bio8 and adenine, inosine, thymine; bio10 and thymidine; bio11 and uracil; bio14 and inosine; bio18 and thymidine, bio19 and inosine. According to the correlation network (Fig. 6D) analysis of 9 principal components of nucleoside compounds and 13 climatic factors, these results show consistency with the above results, and bio5 and bio4 were the most influential climatic factors on the compounds. The results showed that bio4 and bio5 played a decisive role in uracil, adenosine, thymine and 3′-deoxyadenosine, and bio6 had a significant correlation with inosine. The important roles of bio5 and bio4 in nucleoside compounds were also confirmed.

Discussion

The quality of TCM mainly depends on the type and content of the main active chemical composition, which is affected by many factors and fluctuates. Cordyceps is a rare TCM that can be used as both medicine and food in China. In fact, of all available literature reporting on Cordyceps, C. sinensis and C. militaris accounted for 60% of the total reports. Cordyceps militaris has been used as a tonic in China for hundreds of years, has been reported to have similar health benefits as C. sinensis and is used as an alternative38,39,40,41. Research performed on C. cicadae over the past two decades has shown that it possesses biological properties and bioactive compounds similar to those of C. sinensis and C. militaris and has suggested that it can be used as an alternative source of Cordyceps13. In addition, Cordyceps gunnii is popularly referred to as Chinese rare caterpillar fungus18. Therefore, Cordyceps cicadae, C. militaris and C. gunnii were selected as the candidates for substitution in this study. In this study, metabolomics was used for the first time to detect and analyse all metabolites of C. sinensis, C. cicadae, C. militaris and C. gunnii, and the suitability distribution was predicted by an ecological niche model. The MaxEnt model was used to predict the suitable habitat areas of Cordyceps, and ArcGIS was used for visual analysis. Correlation analysis of chemical components and eco-climate factors was conducted to explore the effects of climate factors on the main active components of Cordyceps, and temperature seasonality and max temperature of the warmest month had the most prominent effects on nucleosides. The results showed that the distribution area of C. sinensis, C. cicadae and C. gunnii species was consistent with the recorded distribution, while C. militaris was predicted to be distributed in Guangdong and Taiwan, indicating that the artificial cultivation of cordyceps could be considered in these areas. In the predicted future C. suitability distribution region, the area of suitable areas for C. cicadae is generally decreasing, which may be due to the lack of artificial cultivation. Compared with C. cicadae, the area of suitable areas for C. sinensis, C. militaris and C. gunnii is generally rising gently, which may alleviate the problem of resource exhaustion, but the overall effect is not good. However, in addition to climate variables, other factors, including soil, terrain, biological factors and human activities, also affect the content of nucleoside compounds in cordyceps and are key to the formation of medicinal plant metabolites. Further studies incorporating these factors into the model will further enhance the prediction of possible changes in the distribution of Cordyceps in China due to changes in climatic conditions. In this study, we successfully established a distribution model of potential species of C. sinensis, C. cicadae, C. militaris and C. gunnii with changes in climate variables and estimated the impact of climate change on the suitable habitat of Cordyceps.

Cross-analysis of various techniques showed that C. sinensis, C. cicadae, and C. gunnii had a strong correlation of compounds and climate factors, indicating that their growing environmental conditions and nucleoside compounds were similar. It is suggested that experts consider C. cicadae and C. gunnii as new medicinal sources of C. sinensis and include them in the pharmacopoeia. The results provide a basis for the sustainable development of C. sinensis resources and offer a new way to evaluate the impact of climate factors on the quality of TCM.

Methods

Fungal sample materials and coordinate data sources

Cordyceps sinensis, C. cicadae, C. militaris and C. gunnii were purchased from Yunnan Province, Anhui Province, Liaoning Province and Guizhou Province, respectively, and were identified by Professor Linfang Huang, Institute of Medicinal Plants, China Academy of Chinese Medical Sciences and Peking Union Medical College. They were purchased in 2014 (Fig. 2). Low temperature storage was employed (− 20 °C). For a total of 12 samples, each sample set consisted of three biological replicates. Specific sample information is shown in Table 4.

Table 4 Source information of C. sinensis, C. cicadae, C. militaris and C. gunnii samples.

The coordinate data of C. sinensis, C. cicadae, C. militaris and C. gunnii were mainly obtained from (1) the Chinese Virtual Herbarium (CVH, https://www.cvh.ac.cn/), (2) the National Specimen Information Infrastructure (NSII, http://www.nsii.org.cn/2017/home.php), (3) the Global Biodiversity Information Facility (GBIF, https://www.gbif.org/), (4) published literature, and (5) at the sampling site. The sample distribution information was quickly obtained through retrieval, totalling more than 4600 occurrences. Since the location of most sample points was not accurate, the detailed geographic location of sample collection could not be accurately expressed. We used the Trim Duplicate Occurrences tool of ENMTools V1.4.442 and Baidu tools (http://api.map.baidu.com/lbsapi/getpoint/index.html) to confirm the geographic coordinates of each distribution record, eliminate specimens without location coordinates and duplicate data, and filter the species data. The spatial resolution of the bioclimatic variables was 2.5 arc-min (approximately 4.5 km2), and the buffer distance was set to 3 km. Only one distribution point was retained when the distance between the distribution points was less than 3 km. After duplicate points were removed, the error of the clustering effect in predicting potential distribution areas could be reduced, and the model overfitting could be reduced43,44. We understood the spatial autocorrelation to be due to overfitting of the data results caused by an overdense sample size in some regions45. These operations can greatly reduce the spatial autocorrelation of species occurrence data and effectively reduce the error46. Finally, data from 2529 samples of C. sinensis, C. cicadae, C. militaris and C. gunnii were obtained (Fig. 2).

Cordyceps sample extraction

The sample was thawed on ice. Cold steel balls were added to the mixture and homogenized at 30 Hz for 3 min. Then, 1 mL of 70% methanol with internal standard extract was added to the homogenized centrifuge tube, and the mixture was swirled for 5 min and then centrifuged at 12,000 rpm at 4 °C for 10 min. After centrifugation, 400 µL of supernatant was drawn into the corresponding EP tube and stored in a − 20 °C refrigerator overnight, centrifuged at 12,000 r/min at 4 °C for 3 min, and 200 µL of supernatant was placed in the liner of the corresponding injection bottle for on-board analysis.

Conditions of liquid chromatography and mass spectrometry

T3 UPLC Conditions: The sample extracts were analysed using an LC‒ESI‒MS/MS system (UPLC, ExionLC AD, https://sciex.com.cn/; MS, QTRAP® System, https://sciex.com/). The analytical conditions were as follows: UPLC: column, Waters ACQUITY UPLC HSS T3 C18 (1.8 μm, 2.1 mm × 100 mm); column temperature, 40 °C; flow rate, 0.35 mL/min; injection volume, 5 µL; solvent system, water (0.1% formic acid): acetonitrile (0.1% formic acid); gradient program, 95:5 V/V at 0 min, 10:90 V/V at 10.0 min, 10:90 V/V at 11.0 min, 95:5 V/V at 11.1 min, 95:5 V/V at 14.0 min.

QTOF-MS/MS: A triple TOF mass spectrometer was used due to its ability to acquire MS/MS spectra on an information-dependent basis (IDA) during an LC/MS experiment. In this mode, the acquisition software (TripleTOF 6600, AB SCIEX) continuously evaluates the full scan survey MS data as it collects and triggers the acquisition of MS/MS spectra depending on preselected criteria. In each cycle, 12 precursor ions whose intensity was greater than 100 were chosen for fragmentation at a collision energy (CE) of 30 V (12 MS/MS events with a product ion accumulation time of 50 ms each). ESl source conditions were set as follows: ion source gas 1 as 50 psi, lon source gas 2 as 50 psi, curtain gas as 25 psi, source temperature 500 °C, and lon spray voltage floating (ISVF) 5500 V or − 4500 V in positive or negative modes, respectively (Supplementary Fig. S1A,B).

Identification and quantification of metabolites

The metabolites were identified and quantified by Jiaxing Maiwei Metabolic Biotechnology Co., Ltd. (Jiaxing, China). The LC-QTOF-MS/MS experiment was conducted based on the self-built targeted standard database MWDB (including second-order spectrum and retention time, RT) and the MHK database (including Metlin, HMDB, KEGG database information, including secondary spectrum, RT) and MetDNA, and the multi-ion pair information and RT of the identified metabolites were extracted. A multi-reaction monitoring mode (MRM) metabolite detection multipeak diagram (Supplementary Fig. S1C,D) shows the substances that can be detected in the sample, and each different colour mass spectrum peak represents a detected metabolite. The characteristic ions of each substance are screened out by a triple quadrupole rod, and the signal intensity (CPS) of the characteristic ions is obtained in the detector. Then, the chromatographic peaks are integrated and corrected, and the peak area of each chromatographic peak represents the relative content of the corresponding substance. MRM analysis of triple quadrupole mass spectrometry was used to complete the metabolite quantification (Table 1). The filter conditions of differentially accumulated metabolites (DAMs) were as follows: |log2 (fold change)| ≥ 1, p value < 0.05, and variable importance predictive (VIP) ≥ 1. Principal component analysis (PCA) was performed on the samples to preliminarily understand the overall metabolic differences among each group and the variation degree among the samples within the group47. To investigate the accumulation of specific metabolites, PCA and OPLS-DA were performed using R (http://www.R-project.org/).

Climatic factor data

WorldClim is a high spatial resolution global database of weather and climate data. These data can be used for mapping and spatial modelling. This research used WorldClim version 2.1 (https://www.worldclim.org/) 19 bioclimatic variables (bio1-bio19) as environmental monitoring factors. The historical (recent) climate data from 1970 to 2000 and environmental layers of 19 bioclimatic variables for future climate scenarios (2021–2040, 2041–2060, 2061–2080, 2080–2100) were downloaded from the WorldClim Global Climate Database (http://WorldClim.org/), spatial resolution: 2.5 arc min (approximately 4.5 km2)48,49. Bioclimatic variables are used as baselines for future scenarios and represent annual trends, seasonality, and extreme or restrictive environmental factors. Multicollinearity among bioclimatic variables easily occurs, which increases the complexity of the model and affects the corresponding relationship and contribution judgement of the model. To overcome the multicollinearity problem, this article proposes a method for using multicollinearity analysis to examine the correlation between climate factors; for the correlation of absolute value between the two factors| r | > 0.8, only one factor was selected into the model. Prior to this method, climate factors should be input into the MaxEnt model to obtain the contribution degree of each climate factor, then this data can be used as the basis for multicollinearity factor elimination; that is, for factors with strong correlation, factors with a high contribution degree should be selected into the model. According to the results of factor correlation analysis, contribution evaluation and species suitability conditions, the climate variables screened out (Table 2) included annual mean temperature (bio1), isothermality (bio3), temperature seasonality (standard deviation × 100) (bio4), max temperature of warmest month (bio5), minimum temperature of coldest month (bio6), mean temperature of wettest quarter (bio8), mean temperature of warmest quarter (bio10), mean temperature of coldest quarter (bio11), annual precipitation (bio12), precipitation of driest month (bio14), precipitation of wettest quarter (bio16), precipitation of warmest quarter (bio18) and precipitation of coldest quarter (bio19), for a total of 13.

MaxEnt model

The algorithm of the free software MaxEnt V3.4.0 (http://bioversityinformatic.amnh.org/open_source/MaxEnt/) calculates the most likely potential geographic distribution of a species based on the relationship between geographic data and the known distribution of the target species50. Existing studies have shown that the MaxEnt model may not be the best model under the default setting, and the MaxEnt model is easier to overfit when the sample size is small, so it is necessary to adjust the model parameters45,51,52. MaxEnt software has multiple available parameter settings, and the prediction ability of statistical significance and model complexity should be considered when determining the optimal parameters. Therefore, the optimal parameter combination should be found through model calibration so that the prediction results can be as close as possible to the ideal fitness status of species45,53,54,55. In this study, we refer to the latest research progress in recent years and call an R language package called Kuenm to realize the automatic calibration and evaluation of the important parameters of the MaxEnt model. The optimal settings are selected from 600 parameter combinations, which further improves the reliability of the prediction results of suitable areas.

The two most important parameters of MaxEnt are the feature combination (FC) and regularization multiplier (RM), and the optimal selection of these two parameters contributes to significantly improving the prediction accuracy of the model26,27,45,53,54,56. There are 5 options for FC, which are linear (L), quadratic (Q) product (P), threshold (T) and hinge (H), and 15 different combinations are set (including L, Q, P, H, LQ, LP, LH, QP, QH, PH, LQP, LQH, LPH, QPH, LQPH). RM parameters are generally set to less than 4, and one RM value is set from 0.1 to 4 every 0.1 interval, so a total of 40 RM values are set26. The Kuenm program package of R software was employed, and MaxEnt was used to carry out the prediction operation of 600 different parameter models (15 FC settings and 40 RM values freely combined)53. This method first selects the set of models with statistically significant models and omission rates less than 5% from all the competing models, and then according to Akaike Information Criterion (AICc), selects the model with a Delta AICc value less than 2 as the recommended model53,55,57. If R software selects more than one recommended model, the model with the smallest Delta AICc value will be regarded as the optimal model.

Our modelling was performed according to the standard protocol for reporting species distribution models by Zurell et al.58. The feature parameters were set as linear features, quadratic features, product features, threshold features and hinge features, and “Create response curves”, “Make pictures of predictions” and “Do jackknife to measure variable importance” were chosen to interpret how individual variables affect the probability of the presence of Cordyceps. In the basic part, the “Random test percentage” was set as 25, representing 75% of the sample data that were randomly selected as the model training set; the remaining 25% of the sample data were used as the test set to verify the model. The “Regularization multiplier” was set as 1 to prevent overcomplexity and reduce overfitting by controlling the intensity of the chosen feature classes. The “Maximum number of background points” was set as 10,000, and the “Replicates” was set as 10. In the advanced part, the “Maximum iterations” was set as 500, and the “Convergence threshold” was set as 0.0001. The output format was set as “Cloglog”, and a previous study showed that the “Cloglog” output was the optimal output mode for predicting the suitable area59. The MaxEnt model used the receiver operating characteristic curve (ROC curve) analysis method to verify the accuracy of the model. The ROC curve based on the threshold relies on the judgement model precision by changing the threshold, employing the false-positive rate (the actual be predicted without the species distribution probability of positive) as the abscissa, the true positive rate (actually have the species distribution and predict positive probability) for the vertical curve, and the curve with the abscissa of area as the AUC (the value of area is between 0 and 1, which is used to measure the accuracy of the prediction results of the model). It is generally believed that AUC < 0.6 indicates that the prediction results fail, AUC > 0.8 indicates that the forecast result is good, and AUC > 0.9 is considered excellent60. The importance of variables was assessed using a jacknife method. Response curves were used to obtain the range of bioclimatic variables. The spatial statistical function of ArcGIS software was used to determine the suitability grades of the potential suitability areas of C. species according to the distribution results of the real areas of C. species, and the areas and proportions of different potential suitability areas could be calculated.

Correlation analysis of climate factors and metabolites

ArcGIS was used to extract 13 key eco-climatic factors. Correlation analysis was conducted between the selected bioclimatic variables involved in the growth of Cordyceps and the nucleoside metabolites in Cordyceps. Pearson correlation coefficient (r) between climate factors and the relative contents of nucleoside compounds was calculated for correlation analysis. The screening criteria required that the correlation coefficient was in the absolute range of 0.8–1.0, indicating a strong correlation. A value of 0.6–0.8 is strongly correlated, 0.4–0.6 is moderately correlated, 0.2–0.4 is weakly correlated, and 0–0.2 is extremely weakly correlated or not correlated. Cytoscape (Cytoscape Consortium, USA, version 3.7.1) was used to visualize networks of interactions between climate factors and differential nucleoside metabolites. The R software package Draws a Pheatmap.

Statement of licences for plant material

This research that experimental researchon cultivated plants, comply with the IUCN Policy Statement on Research Involving Species at Risk of Extinction and the Convention on the Trade in Endangered Species of Wild Fauna and Flora, and does not involve collection of wild plants.