The gene expression profiles in response to 102 traditional Chinese medicine (TCM) components: a general template for research on TCMs

Traditional Chinese medicines (TCMs) have important therapeutic value in long-term clinical practice. However, because TCMs contain diverse ingredients and have complex effects on the human body, the molecular mechanisms of TCMs are poorly understood. In this work, we determined the gene expression profiles of cells in response to TCM components to investigate TCM activities at the molecular and cellular levels. MCF7 cells were separately treated with 102 different molecules from TCMs, and their gene expression profiles were compared with the Connectivity Map (CMAP). To demonstrate the reliability and utility of our approach, we used nitidine chloride (NC) from the root of Zanthoxylum nitidum, a topoisomerase I/II inhibitor and α-adrenoreceptor antagonist, as an example to study the molecular function of TCMs using CMAP data as references. We successfully applied this approach to the four ingredients in Danshen and analyzed the synergistic mechanism of TCM components. The results demonstrate that our newly generated TCM data and related methods are valuable in the analysis and discovery of the molecular actions of TCM components. This is the first work to establish gene expression profiles for the study of TCM components and serves as a template for general TCM research.


Results
Generation of gene expression profiles for 102 TCM components. The schematic view of data construction and processing is presented in Fig. 1. All molecules were derived from TCMs and were mainly active ingredients in Chinese herbs and TCM formulae. MCF7 cells were then treated with the derived molecules and total RNA was extracted for microarray analysis. Finally, the gene expression profiles of TCM components were established for the study of TCMs. It is possible that some gene expression profiles of TCM ingredients have been reported in previous studies. However, we produced the gene expression profiles on a unified platform that was more conducive to the collective analysis of the different ingredients. In total, 102 components are provided in Table 1. The raw data of gene expression profiles of TCM components are available through the National Center for Biotechnology Information's Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/), and the GEO series accession number is GSE85871. The gene expression profile data can be analyzed in combination with public database CMAP and other bioinformatics methods.

Analysis of TCM component activities.
We first performed a comparison analysis between the gene expression profiles in response to TCM components and the CMAP database to discover their molecular functions. Nitidine chloride (NC, Fig. 2a) is a natural phytochemical alkaloid and a major active compound isolated from the well-known traditional Chinese medicinal herb Zanthoxylum nitidum (Roxb.) DC. Previous studies have reported that NC is a potential anti-tumour drug via the modulation of multiple targets/pathways [9][10][11] . In the present study, the gene expression profiles of NC-treated MCF7 cells were selected to search the CMAP database. The query signature consisted of 752 genes (173 up-regulated and 579 down-regulated; Supplementary Data 1) that were simultaneously submitted to the CMAP database for analysis.
The similarity between the gene expression profiles of the query signature and a CMAP instance was measured using the connectivity score (from −1 to 1). A highly positive connectivity score indicates inducement of the expression of the query signature by the corresponding drug. The CMAP yielded highly positive connectivity scores for NC-treated MCF7 cells. In the detailed results of CMAP analysis, the top 10 instances of positive correlations are presented in Table 2. The results revealed a total of five compounds in the top ten instances, including irinotecan, phenoxybenzamine, hycanthone, camptothecin, and daunorubicin. Among these five compounds, irinotecan and camptothecin are topoisomerase I inhibitors, and daunorubicin is a topoisomerase II inhibitor, as reported in previous studies [12][13][14] . Furthermore, phenoxybenzamine is a known α-adrenoreceptor antagonist in  the treatment of hypertension, but there have been no reports that NC acts as an α-adrenoreceptor antagonist. An extension of this finding would be to hypothesize that NC might perform activities based on same or similar activities as the compounds with the highest positive connectivity scores. Topoisomerase I/II are main targets for antitumor drugs, and some studies have reported that NC inhibits topoisomerase activities [15][16][17] . However, few reports have conducted experiments on the simultaneous inhibition of topoisomerase I/II activities by NC. To validate the topoisomerase I/II inhibitory activity of NC, we evaluated the effect of NC on the stabilization of the cleavable complex that forms in the presence of topoisomerase I/II and DNA. As illustrated in Fig. 2b and c, NC was active against the topoisomerase I/II-mediated relaxation of supercoiled DNA. In addition, the effect of NC completely inhibited topoisomerase I at a concentration of 10 μM and completely inhibited the cleavage activity of topoisomerase II at 5 μM.
Furthermore, to validate the function of NC as an α-adrenoreceptor antagonist, we performed the classic adrenaline reversal experiment. α-Adrenoreceptor-blocking agents have the ability to cause a reversal of the adrenaline pressor response 18 . Adrenaline administration to animals results in an increased mean arterial blood pressure given that its α-receptor agonist properties predominate. However, when an α-adrenoreceptor antagonist is present, the β-receptor agonist property of adrenaline plays a leading role and causes a decrease in arterial pressure or reversal of the pressor response. As shown in Fig. 2d and e, after rats were injected with adrenaline (5 μg/kg), the arterial blood pressure increased abruptly. However, after pre-treatment with an injection of NC (0.1 mg/kg) followed by an injection of adrenaline (5 μg/kg), the arterial blood pressure significantly decreased.   Table 2. Top CMAP analysis of nitidine chloride in detailed results.
In addition, NC dose-dependently reduced the adrenaline pressor response (Fig. 2f). NC (0.1 mg/kg) caused a significant reversal of adrenaline pressor response with a reduction to normal levels (35 ± 1.4 mmHg).
In summary, most molecules have more than one effect, especially TCM molecules. The results of CMAP analysis showed that topoisomerase I/II inhibitor and α-adrenoreceptor antagonist produced high positive connectivity scores. After topoisomerase I/II inhibitory activity assays and adrenaline reversal experiments, NC was validated as an effective topoisomerase I/II inhibitor and candidate α-adrenoreceptor antagonist.
Synergistic mechanism of TCM components. TCMs consist of numerous ingredients, and the therapeutic effect of TCMs mainly originates from the synergistic effect of these multiple components 19 . Synergy is one of the fundamental advantages of multicomponent therapeutics, indicating that combinational effects are greater than the sum of the individual effects 20,21 . However, the mechanisms of synergistic action remain poorly understood. We attempted to conduct a systematic analysis to explore the rationality of the synergistic effects of the principal compounds in Danshen (Salvia miltiorrhiza roots). Danshen is one of the most versatile TCMs based on its properties of improving microcirculation, causing coronary vasodilatation, suppressing the formation of thromboxane, inhibiting platelet adhesion and aggregation, and protecting against myocardial ischemia, among other effects 22,23 . Therefore, Danshen has been used to treat cardiovascular diseases, including coronary artery disease, hypercholesterolemia, hypertension, arrhythmias, and other cardiovascular diseases, for hundreds of years 23 .
In this work, we selected the four main active components (tanshinone IIA, salvianic acid A sodium, protocatechuic aldehyde and salvianolic acid B) in Danshen to elucidate their synergistic effect in the treatment of cardiovascular diseases. In this study, the differential expression levels of genes in MCF7 cells treated with each compound and four mixtures (Supplementary Data 2) were selected and analyzed in the CMAP database. Only drugs associated with cardiovascular diseases with positive connectivity scores and p < 0.01 were summarized ( Table 3). The CMAP analysis results of four mixtures suggested that 11 drugs were related to cardiovascular diseases, including cardiovascular agents, cardiotonic agents, vasodilator agents, anti-arrhythmia agents, antihypertensive agents and calcium channel blockers. Most of these drugs ranked at or near the top, which indicated significant positive enrichment. However, the CMAP analysis results of single compounds indicated that fewer drugs were associated with cardiovascular diseases and received a lower ranking. The CMAP analysis results indicated that four mixtures possessed more positive effects on the therapeutic efficacy of cardiovascular diseases than any other individual components.
We also applied the algorithm of random walk with restart (RWR) to elucidate the synergistic effects of multi-components of Danshen. RWR is a widely accepted algorithm that globally scores each gene in the entire network by computing the effects of seed genes [24][25][26] . Therefore, we computed the cardiovascular effect scores of the mixture of the four components and each single component separately. In addition, to verify whether their effect scores regarding cardiovascular disease were notable, we computed their Z-scores and compared them with random counterparts. The results are listed in Table 4. The mixture of the four components obtained a high  effect score of 0.72, which was much higher than that of any single component. An absolute Z-score greater than 3 is generally deemed as a threshold, which suggests a statistically significant deviation between the actual value and the random values. Thus, the Z-score of 4.958 for the mixture of the four components was much higher than that of any single component, which suggested that the synergy among the four components is significantly associated with the effects of Danshen on cardiovascular disease. Therefore, the RWR algorithm showed that the synergistic effect of the mixture of the four components on multiple target genes outperforms the effects of single components.
In addition, cardioprotective effect assays were conducted on single compounds and four mixtures to validate the synergistic mechanisms of components. As showed in Fig. 3, hypoxia/reoxygenation (H/R)-induced cell injury significantly reduced cell viability. At concentration of 10 µM, all single compounds and four mixtures could notably protect H9c2 cells from H/R-induced cell injury. The cell viability of four mixtures was higher than that of any single compound. Thereby, four mixtures can exert more cardioprotective effect than single compounds, which sheds light on the synergistic therapeutic mechanisms of TCM components.

Discussion
With the application of TCMs in various diseases receiving increasing attention worldwide, more and more efforts have been made to elucidate the mechanisms of TCMs. TCMs are highly diverse, with abundant ingredients, making TCMs a potential repository of molecules for drug discovery. Thus, developing an effective method to investigate the molecular mechanisms of TCMs is necessary. Since the emergence of microarray techniques, gene expression profiling has been widely used in the field of TCMs, and various achievements have been attained using this methodology. Lee et al. 27 used gene expression profiles combined with the CMAP database to elucidate the mechanism of the Chinese herbal medicine berberine, which inhibits global protein synthesis and basal AKT activity and induces endoplasmic reticulum (ER) stress and autophagy. Li et al. 28 adopted gene expression profiles to elucidate the multi-compound, multi-target and multi-pathway mechanism of action of a TCM formula, QiShenYiQi, on myocardial infarction. Therefore, gene expression profiles can provide new insight into the mechanisms of TCMs at the molecular and gene levels.
In this study, we established the gene expression profiles of cells in response to 102 TCM molecules, and this information will likely accelerate progress in understanding the molecular mechanisms of TCMs. Then, we used gene expression profiles combined with the CMAP database to explore the molecular functions of NC. After validation experiments, NC was identified as a topoisomerase I/II inhibitor and a potential new α-adrenoreceptor antagonist. This result indicates that some compounds in TCMs have more than one function and the approach is efficient. In addition, we analyzed the gene expression profiles of four active ingredients in Danshen and elucidated the theory of synergistic action in TCM multicomponent therapeutics. These results demonstrate the reliability and utility of our gene expression profile data. This approach provides an integrative platform to simultaneously analyze a large number of genes associated with TCM ingredients and offers convenience to researchers.
We only selected 102 molecules from TCMs, and these molecules only represent a small part of the numerous compounds in TCMs. In subsequent studies, we will further expand the number of small molecules. Such data will provide more possibilities for research on the molecular mechanisms of TCMs. For example, the theory of detoxification in TCMs is that it occurs mainly through herbal ingredient interactions, which can likely be elucidated by analyzing the gene expression profiles of molecules at the gene level. In the future, the gene expression profiles of cells in response to TCM components can help researchers to perform their own bioinformatics   analyses to clarify the mechanisms of action of TCMs in real time. In short, this is just the beginning, and additional outcomes will depend on the use of the data.

The establishment of gene expression profiles of TCM components. The selection of components in
TCMs. We selected 102 small molecules that are commonly found in Chinese herbs and TCM formulae, such as Radix Salviae Miltiorrhizae, Rhizoma Coptidis, and Shexiang Baoxin Pill. Most of these 102 compounds are the quality control components of TCMs from the China Pharmacopoeia and are selected to represent a broad range of activities and diverse structures.
Cell lines. The gene expression profile data for 102 molecules were produced for MCF7 cells. MCF7 cells are commonly used in the worldwide laboratories as a reference cell line, have clear biological characteristics, remain stable after prolonged culture, and can be cultured in microplates. In addition, MCF7 is the one of the main cell lines used in the CMAP database 4 . Furthermore, the MCF7 cell line was procured from American Type Culture Collection (ATCC) and cultured in MEM/EBSS (Hyclone) supplemented with 10% foetal bovine serum, 1 mmol/L sodium pyruvate, 0.1 mmol/L MEM non-essential amino acids, 100 unit/mL penicillin, and 100 mg/mL streptomycin in an incubator containing 5% CO 2 at 37 °C.
Small molecule-treated cells. The gene expression profiles can be affected by the concentration and duration of compound treatment. According to the CMAP database, the concentration of small molecules was set at a single dose of 10 μM, which is also internationally recognized as a reasonable concentration for high-throughput screening 4 . Simultaneously, the cell's survival rate was investigated upon treatment with compounds at a concentration of 10 μM using the MTT assay. If the cell survival rate was <40%, the number of cells could not meet the needs of microarray analysis. Therefore, the concentration of compounds was decreased to 1 μM until the cell survival rate was >40%. For each compound, the duration of the treatment was 12 h and two biological replicates were performed. Full details of small molecules and treatment conditions are provided in Table 1.
RNA isolation and quality. After pre-treatment, MCF7 cells were harvested and total RNA was extracted using TRIzol reagent (Life Technologies, Carlsbad, CA, US) according to the manufacturer's instructions. To control the quality and purity of isolated total RNA, formaldehyde agarose gel electrophoresis and spectrophotometry (NanoDrop, Wilmington, DE, USA) were performed. Moreover, DMSO-treated cells were selected as a control.
Microarray analysis. The gene expression profiles were assessed using microarray technology with Affymetrix Human Genome U133A 2.0 (Santa Clara, CA, US), which was used in numerous studies, covering 18,400 transcripts and including 14,500 characterized human genes [29][30][31] . Total RNA was purified using a QIAGEN RNeasy Kit (GmBH, Germany) according to the manufacturer's protocols, and biological duplicates were employed for each cell line. Then, total RNA was used to generate double-stranded cDNA and biotin-labelled cRNA. Following fragmentation, cRNA products were hybridized to an Affymetrix Human Genome U133A 2.0. GeneChip array, and hybridized arrays were washed and stained using a GeneChip ® Hybridization, Wash and Stain Kit (Affymetrix). Finally, the fluorescent signals were measured with the GeneChip ® Scanner 3000 (Affymetrix).
The data from this publication have been deposited in NCBI's Gene Expression Omnibus (GEO series accession number: GSE85871). Raw data (CEL files) were normalized by MAS 5.0 algorithm, Gene Spring Software 11.0 (Agilent Technologies, Santa Clara, CA, US). Subsequently, quality control (QC) analysis was performed on the expression data, including overview of QC analysis, quality on-chip analysis, comparative analysis among multiple samples, PCA, and RNA degradation analysis, by using the affy package in R language. The results indicated that all data met the requirements for bioinformatics analysis.
Similarity search against the CMAP. In the investigation of the function of a small molecule, a similarity search against the CMAP was performed. For each treatment of one compound (one treated versus the corresponding DMSO pair), Fold Change was used to filter the differential expression probes which was calculated as follows: first of all, average normalized expression values were calculated for two replications for each small molecule and DMSO; second, Fold Change was represented by average normalized expression value of treatment divided by the average normalized expression value corresponding to DMSO. Then, differential expression probes were selected according to fold change (e.g., FC ≥2 or ≤0.5), the criteria used for filtering the differential expression probes were consistent among the small molecules. The gene-expression signature of the compound was represented by two sets ('up-' and 'down-' probe sets, saved as. grp files and required as the inputs for CMAP), which was made up by the significant up/down regulation probes respectively. The query in the CMAP was performed as a "quick query" in the query section of http://portals.broadinstitute.org/cmap/.
Topoisomerase I/II inhibitory activity assay. Topoisomerase I/II inhibitory activity assays were conducted according to the procedure described in a previous study 32  The DNA TopIIα inhibitory activity of the compounds was measured using a Topoisomerase IIα Drug Screening Kit (TopoGEN, Inc.). Compound concentrations, pBR322 plasmid DNA (0.25 μg) and 0.75 unit of TopII were combined in a final volume of 20 μL buffer (50 mM pH 8.0 Tris-HCl, 150 mM NaCl, 10 mM MgCl 2 , 5 mM dithiothreitol, 30 μg/Ml bovine serum albumin, 2 mM ATP). Then, the following experimental procedures were performed according to Topoisomerase I inhibitory activity assay.
Adrenaline reversal experiments. Male Sprague-Dawley rats (300-350 g) were obtained from the Slac Laboratory Animal Co., Ltd. (Shanghai, China), and given free access to tap water and food pellets. All animal experiments were carried out under standard conditions according to the guidelines for the Care and Use of Laboratory Animals of the National Institutes of Health and were approved by the Committee on the Ethics of Animal Experiments of the Second Military Medical University, China. The animals were anesthetized with urethane (1 g/kg) with minimal suffering. Mean blood pressure was continuously monitored from a cannulated carotid artery using a pressure transducer to a polygraph (Alcott Biotech Co., Ltd. Shanghai). All care was taken that animals could breathe normally. Adrenaline and NC were administered through a catheter inserted into the tail vein. Experiments were performed only after completion of the operative procedures to permit arterial blood pressure to stabilize.
Animals were randomly divided into five groups. Group A rats received a single injection of adrenaline (5 μg/ kg). Group B, C, and D rats received an injection of NC (0.01, 0.025, and 0.1 mg/kg, respectively) after a 2-min injection of adrenaline (5 μg/kg). Group E rats received a single injection of NC (0.1 mg/kg). After the experiments, all animals were sacrificed by tail vein air injection.
RWR-based evaluation of compounds' effect. Data preparation. In total, 301 distinct genes associated with cardiovascular diseases were collected by searching the key word "Cardiovascular Disease" in a plugin of Cytoscope, DisGeNet. In addition, Version 10 of the STRING database was employed as a resource of the PPI network. We extracted interactions with confidence scores greater than 0.9 or target-related edges with maximum scores. Thus, a PPI network was constructed, and its greatest weighted component had 10,270 nodes and 176,739 edges.
RWR algorithm. RWR can globally score seed genes' effects on each gene in the entire network and can be denoted as follows: where P is the column-normalized adjacency matrix of the network, χ 0 is the initial vector that indicates the seed nodes' strength, and χ t denotes a probability vector in which the ith element holds the chance of the walker being at v i node at step t. Parameter r indicates a restart probability indicating the likelihood that the walker will return to seed set at step t. In practice, 0.3 is an optimal value. A steady state of χ t will be reached after performing the eq. (1) iteratively with sufficient time, which can disclose to which extent each node is affected by seed nodes. In this paper, disease genes and differentially expressed genes under the treatment of the molecules are regarded as seed nodes separately. To calculate disease effect, the corresponding component of the disease gene in the initial vector is χ 0 (v) = 1. To compute drug effect, if node v is a drug target, we define its corresponding component in the initial vector χ 0 as χ 0 (v) = 0.01. After running the RWR with these initial vectors, we obtain the drug and disease effect vectors χ drug and χ disease , respectively. Then, we compute the inner product between the effect vectors of drug and disease to measure how the drug-affected network and disease-affected network overlap. The equation is disease drug To measure the statistical significance of the score s, we randomly generate 1000 counterparts that have the same number of drug targets and calculate s scores. Suppose s and Δs r are the mean and standard deviation of these random counterparts' scores, respectively. Then, the z-score can quantify the score difference among the original seed set and counterparts as follows: Typically if the z-score is greater than 3, the drug can be considered as exhibiting statistically stronger effects than random cases.
Cardioprotective effect assay. Cardioprotective effect assays were performed by determing the effects of single compounds and four mixtures against H/R-induced H9c2 cells injury. Rat H9c2 cardiomyocyte cell line was obtained from Chinese Academy of Sciences Cell Bank (Shanghai, China) and maintained in DMEM supplemented with 10% foetal bovine serum at 37 °C in CO 2 incubation. To mimic the ischemic injury in vitro, H9c2 cells were placed in a humidifed chamber containing the cells with 95% N 2 and 5% CO 2 for 4 h and maintained in serum-free and glucose-free DMEM. Then, the cells were transferred to normal conditions for 20 h and cultured in routine culture medium to achieve reoxygenation. The single compounds and four mixtures (1:1:1:1) were added 1 h before the hypoxia period. Cell viability was determined by Cell Counting Kit-8 assay (CCK-8; Dojindo, Kumamoto, Japan).