Introduction

The budding yeast, Saccharomyces cerevisiae, is a powerful model organism for biological studies due to its genetic tractability and ease of propagation [1]. Although human and yeast are distant evolutionarily, essential genes and pathways are highly conserved between these organisms. Indeed, when a total of 414 essential yeast genes were replaced by their human orthologs, nearly half (47%) could be successfully humanized [2]. Because of homologous cell cycle, cell fate, and aging pathways, S. cerevisiae has been directly used to discover therapeutic leads against a wide variety of diseases, in both target-based and phenotypic-based screens [3,4,5,6].

Yeast is an ideal model for global genetic analysis. Numerous studies, including the analysis of haploid deletion mutants of non-essential genes, haploinsufficiency profiling of essential genes [7, 8], and the analysis of essential gene conditional alleles [9, 10], have been carried out to explore single gene function. More complex genetics can be examined through synthetic genetic array (SGA) analysis, an automated technique for studying genetic interactions on a large scale. A genetic interaction occurs when the combination of different gene perturbations leads to an unexpected phenotype, such as synthetic lethality, where two viable mutations combine to generate a lethal or severely sick double mutant phenotype [11, 12]. Genome-wide SGA analysis generated a global genetic interaction map of budding yeast based on genetic interaction profile similarity [13, 14]. The set of genetic interactions for any given query gene comprise its genetic interaction profile. Genes that belong to the same biological pathway tend to share similar patterns of genetic interactions, thus genetic interaction profiles provide a quantitative description of every gene’s biological function. This allows annotation of many genes to general bioprocesses, such as DNA replication, RNA splicing, protein degradation, and nuclear-cytoplasmic import, at an intermediate profile similarity threshold, or to specific protein complexes and signaling pathways such as the proteasome, the vacuolar H+ ATPase, and the protein kinase C pathway, at higher profile similarity thresholds. Thus, the global genetic interaction map can be viewed as a comprehensive map of cellular function [14].

Chemical genomics combines genetic resources, such as large-scale mutant collections and chemical libraries to investigate the potential association between compounds and cellular targets [15,16,17]. Recently, we developed an unbiased, high-throughput, chemical genomic screening platform to functionally characterize bioactive compounds in a highly parallel assay [18]. To minimize the amount of compound required for screening, our pipeline employs a diagnostic set of DNA-barcoded gene deletion mutant yeast strains, each constructed in a drug-hypersensitive genetic background. These mutants are pooled and grown competitively in the presence of bioactive compounds. Strain-specific DNA barcodes enable monitoring of strain abundance within a pool and identification of chemical-genetic interactions. A second set of barcodes unique to each compound treatment is introduced by polymerase chain reaction (PCR), allowing highly multiplexed experiments resolved using next-generation sequencing (NGS) [18].

The present study explores the utility of yeast chemical-genetic profiles for discovering small molecules with defined targets in mammalian systems. From a primary analysis of 333,200 compounds from the Chinese National Compound Library (CNCL) that contain small molecules from both synthetic and natural sources, we identified 5360 compounds that inhibited yeast cell growth and subjected this subset to a high-throughput screening (HTS) campaign using our diagnostic chemical genomic platform [18]. Ultimately, 1109 compounds with high-confidence target predictions, associated with 17 major bioprocesses, were identified, and their general cellular functions [18] annotated. To integrate these bioprocess level target predictions with mammalian cell pathways, the chemical-genetic profiles of these compounds were compared to those of a reference library consisting of 1898 approved drugs with known modes of action. Two bioprocesses, namely autophagy, which is highly conserved from yeast to humans [19] and the Wnt/β-catenin signaling pathway, which is found only in higher eukaryotes, were chosen to verify the predicted effects of CNCL compounds on mammalian cells. Our results suggest that yeast-based chemical genomic analysis provides a powerful and general approach for predicting cellular targets in mammalian cells.

Materials and methods

Reagents

Benomyl, D-sorbitol, furazolidone, galactose, glucose, methyl methanesulfonate (MMS), β-mercaptoethanol, and yeast extract were purchased from Sigma (St. Louis, MO, USA). Peptone and RNase A were bought from Sangon Biotech (Shanghai, China). QIAcube HT kits were procured from QIAGEN (Duesseldorf, Germany), Geneclean III kits and Zymolyase-100T from MP Biomedicals (Irvine, CA, USA), Universal Hot Start High-Fidelity PCR Master Mix from Biotool (Shanghai, China), Kapa Library Quantification kits from KAPA Biosystems (Wilmington, MA, USA), and Bortezomib from MedChem Express (Monmouth Junction, NJ, USA).

Antibodies

The following antibodies were used: anti-LC3B (Sigma), anti-β-tubulin (Proteintech, Rosemont, IL, USA), anti-β-actin, anti-GAPDH, anti-β-catenin and anti-rabbit IgG (Cell Signaling Technology, Danvers, MA, USA), as well as anti-VDAC1/porin (Abcam, Cambridge, MA, USA).

Compound library

The 335 0988 synthetic and natural product-derived compounds used in this study are stored at the Chinese National Compound Library (CNCL), including 1898 commercially available medicines approved by the Food and Drug Administration (FDA, USA). Their structure diversity covers lactams, heterocycles, amides, secondary amides, sulfonates, sulphonamides, etc. The compounds were of high purity (>95%), checked randomly for quality control, and solubilized in 100% DMSO prior to application.

High-throughput bioactivity screening

The pdr1Δ pdr3Δ snq2Δ drug-hypersensitive strain [18] (Y13206; Supplementary Table 1) and the pool of 310 non-essential gene deletion strains in the same drug-hypersensitive genetic background [18] (diagnostic set) were cultured in YPGal medium (1% yeast extract, 2% peptone, 2% galactose) at 30 °C. All the compounds were dissolved in DMSO from a stock concentration of 2 mM. Positive controls included benomyl, bortezomib, and furazolidone (dissolved in DMSO from a stock solution of 10 mg/mL), as well as methyl methanesulfonate (MMS; dissolved in DMSO from a stock solution of 50 mg/mL).

For bioactivity screening, hypersensitive yeast cells were seeded onto 96-well plates at a density of 9.3 × 104 cells per well by Multidrop Combi (ThermoFisher Scientific, Rockford, IL, USA) and 1 μL compound was added to each well using Freedom EVO platform (TECAN, Männedorf, Switzerland; 80 compounds were screened in each plate). After 24-h incubation at 30 °C, OD600 values were measured by an EnSpire Multimode Plate Reader (PerkinElmer, Boston, MA, USA) to evaluate cell viability. Compounds showing 45%–75% growth inhibition compared to DMSO were selected for chemical-genetic screening using the diagnostic set [18].

High-throughput chemical-genetic screening

Yeast culture conditions were the same as described above. A total of 5520 initial bioactive hits were screened for chemical-genetic profiles against the diagnostic set of pooled yeast strains distributed in 96-well plates (each well containing 4.65 × 105/mL cells, 196 μL YPGal medium, and 2 μL compound). OD600 was measured after 24 h followed by additional 24-h incubation at 30 °C. The plates were then centrifuged at 1300× g for 4 min before adding 125 μL zymolyase dissolved in 1 M D-sorbitol (final concentration: 0.5 mg/mL) with 11.5 μM β-mercaptoethanol. The samples were incubated for 1 h at 37 °C, centrifuged again at 2272× g for 5 min and processed using the QIAamp 96 DNA kit. Genomic DNA extraction was performed with an automated high-throughput nucleic acid purification robot, QIAcube HT (QIAGEN).

Strain-specific DNA barcodes were amplified using multiplex primers and a communal U2 primer [18]. PCR conditions were set as follows: 3 min at 95 °C for initial denaturation, 30 cycles of 15 s at 95 °C, 15 s at 60 °C, 20 s at 72 °C, and a final extension time of 5 min at 72 °C. PCR products were purified from 2% agarose gels by the Geneclean III kit, quantified using Kapa qPCR kit and sequenced with Hiseq 2500 at WuXi AppTec (Shanghai, China).

Data processing and target prediction

The relative abundances of barcoded mutants after compound treatment were quantified using amplicon sequencing. Chemical-genetic interaction z-scores for enrichment or depletion in the presence of the compound relative to the DMSO control were generated from sequencing data using the BEAN-counter software pipeline (https://www.github.com/csbio/BEAN-counter) [18, 20]. The screens were performed in four batches: each batch was processed independently using BEAN-counter. Prior to combining into the final dataset, the first singular value decomposition (SVD) component was removed from each of the four individual datasets to remove batch effects that were not reflective of true signal.

Genetic interaction information was obtained from the global S. cerevisiae genetic interaction dataset [14], which was derived from quantitative growth observations of non-essential, double mutant strains described previously [13, 14]. Perturbed biological processes (a subset of 1309 terms from the GO biological process ontology) [21] were predicted from chemical-genetic interaction profiles using the CG-TARGET software pipeline [22]. CG-TARGET leverages the similarity between chemical-genetic interaction profiles and the genetic interaction profiles of their target genes or proteins to predict the biological processes that are perturbed by compounds. It also uses experimental control and randomly re-sampled chemical-genetic interaction profiles to inform its statistical analyses and estimate the false discovery rate (FDR) of each prediction. Predictions with a FDR ≤ 25% were considered to be of high-confidence, with the associated compounds considered high-confidence compounds. Biological process terms were mapped to the set of 17 functional neighborhoods identified by spatial analysis of functional enrichment (SAFE) of the global yeast genetic interaction network [14].

Chemical-genetic profile correlation analysis

To identify potential hits with pharmacological significance, we calculated the similarity between the chemical-genetic profiles of approved drugs (with known modes of action) and bioactive compounds (with high-confidence process predictions). Chemical-genetic profiles of compounds comprising the 1109 high-confidence set (HCS) were compared to that of 77 derived from the approved drug library. Similarity between any two profiles was calculated using Pearson correlation coefficient and ranked from highest similarity (1.0) to lowest similarity (−1.0).

To assess the shared targets of FH535, a known inhibitor of the Wnt/β catenin signaling pathway [23], and 3305-2 G4 (also known as drupacine, an alkaloid isolated from Cephalotaxus fortunei), a pseudo-compound chemical-genetic profile was generated from the profiles of the two compounds. This new profile was generated by element-wise multiplication of the mean of the two profiles with a weight vector derived from the agreement of interactions across the two profiles. Specifically, the weight for each chemical-genetic interaction was calculated as the product of the corresponding interactions in the two compounds’ profiles, with negative resulting values (i.e., sign disagreements) set to zero. This weight emphasizes chemical-genetic interactions that agree between the two compound profiles to produce a quantitative pseudo-compound profile that is composed of the consensus interactions. Direct targets were predicted by utilizing this profile to calculate a genetic interaction normalized cosine score between the pseudo-compound chemical-genetic interaction profile and potential genetic interaction profile targets. This methodology utilizes the same approach used for finding molecular targets with chemical-genetic interaction profiling [18, 22]. To assess the significance of the gene target predictions for the FH535/3305-2 G4 pseudo-compound profile, two sets of pseudo-compound chemical-genetic interaction profiles were generated and a genetic interaction normalized cosine score was calculated using the procedure described above: (1) FH535 with each of the CNCL compounds that had a high-confidence bioprocess target prediction, and (2) 3305-2 G4 with the same set of compounds. These cosine scores were used to assess the significance of the shared targets between FH535 and 3305-2 G4. We calculated a z-score for the FH535/3305-2 G4 pseudo-compound chemical-genetic interaction profile compared to a background distribution consisting of the two sets above. The minimum of the two z-scores generated from the two sets for each potential target was taken as a conservative estimate of the significance of the gene target prediction.

Cell culture

HEK293 cells were purchased from American Type Culture Collection (ATCC, Manassas, VA, USA). HeLa and HepG2 cells were generous gifts from Dr. Jian Ding (Shanghai Institute of Materia Medica, Chinese Academy of Sciences). HEK293 and HepG2 cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Life Technologies, Carlsbad, CA, USA) supplemented with 10% fetal bovine serum (FBS; Gibco, Grand Island, NY, USA) and 1% penicillin/streptomycin (Invitrogen, Carlsbad, CA, USA). HeLa cells were cultured in RPMI-1640 medium (Life Technologies) with 10% FBS and 1% penicillin/streptomycin. They were incubated in a humidified chamber with 5% CO2 at 37 °C. To generate a cell line stably expressing GFP-LC3, the plasmid (Addgene, Cambridge, MA, USA) was transfected into HEK293 cells with Lipofectamine 2000 reagent (Invitrogen) according to the manufacturer’s introductions. Transfectants were selected by puromycin (Life Technologies) and GFP-positive cell lines were assessed by fluorescence microscopy.

Fluorescence microscopy

HEK293 cells stably expressing GFP-LC3 were seeded onto glass cover slips pretreated with poly-D-lysine (Sigma). After culturing for 24 h to reach 50%–70% confluence, the cells were treated with compounds in fresh medium for 24 h to enable efficient autophagosome (GFP-LC3 puncta) formation. This was followed by three-time wash of the slips with PBS, fixation of the cells with 4% paraformaldehyde for 15 min at room temperature and another three washes in PBS. Nuclei were counterstained with Hoechst 33258 (Sigma) for 5 min (three washes in PBS thereafter) and fluorescent images obtained using a TCS-SP8 STED confocal microscope (Leica, Wetzlar, Germany).

Western blotting

Cells were harvested after treatment and lysed in RIPA (Sigma) buffer supplemented with Protease Inhibitor Cocktail (Sigma). The whole cell lysate of each sample was separated by electrophoresis on Tricine-SDS-PAGE gels and transferred to a PVDF membrane (Millipore, MA, USA). The membrane was blocked by 5% non-fat milk in Tris buffered saline/Tween-20 (TBST, EpiZyme, Shanghai, China) and incubated with primary antibodies at 4 °C overnight. The secondary antibodies were used at a concentration of 1:10 000 and incubated at room temperature for 1 h. The protein bands were visualized using the SuperSignal West Dura Extended Duration Substrate (ThermoFisher Scientific). Quantification of the Western blot bands was achieved using the software Image J (National Institutes of Health, Bethesda, MA, USA) and the quantitative data presented were obtained from at least three independent experiments.

Transmission electron microscopy

HEK293 cells were scraped from dishes gently and immediately fixed with 2.5% glutaraldehyde in 0.1 M sodium cacodylate buffer, pH 7.4. After five washes with PBS for 15 min, the samples were post-fixed in 1% osmium tetroxide dehydrated in ethanol and acetone, and embedded in epon. They were then sliced and examined with an FEI Tecnai 12 transmission electron microscope (Hillsboro, OR, USA) operated at 120 kV.

Cell cycle

HeLa cells were seeded at a density of 1 × 105 per well onto 24-well plates and treated with compounds for 18 h. They were digested by trypsin and collected by centrifugation. The pellets were washed with PBS and fixed with 70% (v/v) ethanol at 4 °C overnight. The cells were centrifuged at 188× g for 5 min to discard the supernatant. The pellets were washed twice with PBS and resuspended in 100 μL propidium iodide (PI) working buffer containing 0.2% Triton X-100, 100 mg/mL RNase A and 50 μg/mL PI in PBS for 30 min at 4 °C in the dark. After incubation, fluorescence intensity was measured with a Novocyte flow cytometer with the Novoexpress software (ACEA Biosciences, San Diego, CA, USA).

Reporter assay

The T-cell factor/lymphoid enhancer factor 1 (TCF/LEF1) response motif sequences were cloned into pGM-Lu plasmid ahead of the minimal TA promoter and luciferase reporter gene (Yeasen, Shanghai, China). Plasmids were transfected into HepG2 cells using Lipofectamine 2000 reagent (Invitrogen) following manufacturer’s instructions. The cells were seeded onto 96-well plates 24 h after transfection and waited for 1 h to allow their adhesion to the plate bottom. After adding compounds (20 μM), they were left at 37 °C for 24 h followed by incubation with Steady-Glo reagent (Promega, Irvine, CA, USA) for 10 min at room temperature. Luminescence signals were measured by an EnVision plate reader (PerkinElmer).

Statistical analysis for biological experiments

All biological data were analyzed by GraphPad Prism 7 (San Diego, CA, USA) and the results were presented as means ± SEM. Generally, all biological experiments were carried out with n ≥ 3 replicates and the significance was determined using a two-tailed Student’s t-test. P < 0.01 was considered statistically significant.

Results

Functional annotation of bioactive compounds by correlation analysis of chemical-genetic profiles with the global yeast genetic interaction network

Of 333 200 synthetic and natural CNCL compounds screened for bioactivity against yeast, 5360 (1.6%) were selected for further chemical-genetic screening based on their ability to inhibit cell growth by more than 20% at a concentration of 20 μM (Fig. 1a). Bioactive CNCL compounds were screened against a diagnostic pool of 310 non-essential deletion mutants that were representative of all major biological processes in S. cerevisiae [18]. A compound-gene pair shows a chemical-genetic interaction if compound treatment causes a relative depletion or accumulation of the corresponding barcoded mutant in pooled growth (Fig. 1b). The set of chemical-genetic interactions for a given compound is referred to as a chemical-genetic interaction profile [24]. Importantly, negative genetic interactions, in which two loss-of-function mutations combine to lead to a more extreme fitness defect than expected, are analogous to negative chemical-genetic interactions because a loss-of-function mutation in a specific gene models the effect of a compound which inhibits the gene’s product (Fig. 1c). Hence, to annotate the major cellular function of a bioactive compound, we compare the chemical-genetic interaction profile of that compound to the compendium of genetic interaction profiles for all yeast genes [14]. Based on Gene Ontology (GO) annotations [21] of the genes that share highly similar profiles with a compound, high-confidence target bioprocess predictions can be inferred (Fig. 1c).

Fig. 1
figure 1

Yeast chemical-genetic screening pipeline. a Bioactivity screening in a drug-hypersensitive yeast strain (pdr1pdr3snq2, Y13206) was conducted for the Chinese National Compound Library (333 200 compounds), and an approved drug library (1898 compounds). b Bioactive compounds that inhibited growth from 45% to 75% compared to the vehicle-only control were screened against a diagnostic set of 310 DNA-barcoded yeast gene deletion mutants. Mutants were grown in a pooled format, treated with each compound, and harvested. Genomic DNA was extracted from each pool and experiment-specific tags and mutant-specific barcodes were amplified by PCR. The relative abundance of each gene mutant after compound treatment was quantified by NGS of the PCR amplicons. From this, chemical-genetic profiles for each compound were constructed, with a compound and gene sharing a chemical-genetic interaction if the relative abundance of the gene mutant was significantly reduced after compound treatment. c The chemical-genetic profile of a compound (dashed-line box) was compared to the compendium of synthetic lethal genetic interaction profiles in yeast. A compound was mapped onto the genetic interaction network at the position of the gene with the most similar genetic interaction profile from the compound’s top target bioprocess prediction (yellow circle). In this example, the compound’s top target process prediction is glycosylation. The chemical-genetic interaction profiles of 1109 CNCL compounds with high-confidence target bioprocess predictions were compared to the profiles of 77 approved drugs with high-confidence target predictions. Unknown compounds that shared high profile similarity with approved drugs were identified for validation

Among the screened bioactive compounds, 1109 (21%) were identified to have high-confidence bioprocess target predictions (Fig. 1c). A similar chemical-genetic analysis was carried out using an approved drug library (Selleck Chemicals, Houston, USA), in which all compounds were previously linked to characterized targets in human cells. The approved drug library is relatively small but had 77 (48%; out of 160 initial hits) high-confidence bioprocess target predictions, a more than two-fold higher prediction rate than that of the bioactive CNCL compounds.

Compounds with high-confidence bioprocess target predictions from both the CNCL and approved drug libraries were mapped onto the global genetic interaction network (Fig. 2). This network contains 17 gene clusters corresponding to distinct bioprocesses (Fig. 2a). By placing each compound at the gene node with the most similar genetic interaction profile within the compound’s top predicted biological process target, we constructed the chemical-genetic networks for the CNCL and the approved drug library (Fig. 2b, Supplementary Tables 2 and 3). The colored nodes shown on these networks correspond to the top predicted gene targets for each compound tested, many of which fall into a specific bioprocess region, as defined by the global genetic interaction network (Fig. 2, dashed outlines). Specific compounds from the CNCL library that were experimentally validated (see below) are highlighted in the network (Fig. 2b, Chinese National Compound Library). Supplementary Tables 2 and 3 list the high-confidence target GO process and direct gene target predictions for the approved drugs and CNCL compounds, respectively.

Fig. 2
figure 2

Chemical-genetic networks for the Chinese National Compound Library and a library of approved drugs. a The global yeast genetic interaction network. Colored nodes represent genes. Genes with highly similar genetic interaction profiles are in close proximity, while genes with dissimilar profiles are distant. The network is annotated using Spatial Analysis of Functional Enrichment (SAFE), resulting in 17 general bioprocess annotations for different regions of the network (colored outlines). Genes that were not annotated to a SAFE category are not shown. b The chemical-genetic networks for compounds from the Chinese National Compound Library (middle network) and approved drugs (right network) with high-confidence target predictions. Colored nodes represent compounds whose top predicted gene targets can be mapped to any general bioprocess defined by SAFE. The colored outlines are the regions of the genetic interaction network that correspond to the 17 bioprocesses. Each node/compound is placed on the map at the position of the gene within the compound’s top predicted target process whose genetic interaction profile is the most similar to the chemical-genetic interaction profile of the compound. Compounds with positive validation results are highlighted in the CNCL network (big circles with black outlines). CD0296-G005 and WNN1716-H005 (white nodes) were validated to modulate autophagy, but do not belong to any SAFE bioprocess, even though they have high-confidence target GO process predictions. Other compounds with high-confidence target GO process predictions that do not map to any of the 17 SAFE bioprocesses are not shown

Selected drugs from the approved drug library map to different parts of the network (Fig. 2b, Approved Drug Library). In particular, drugs that target highly conserved complexes and pathways map to the expected bioprocesses. For instance, idarubicin, a chemotherapeutic agent that binds to DNA and interferes with DNA unwinding by topoisomerase II during replication [25], is positioned within the “DNA replication and repair” bioprocess. CEP-18770 (delanzomib), an inhibitor of the chymotrypsin-like activity of the proteasome [26], is located in the “Protein degradation” bioprocess and it has RPT6, which encodes a proteasome subunit, as its top predicted gene target. Other drugs with similar known modes of action map to the same yeast bioprocess and have the same predicted direct target gene. For example, triapine inhibits ribonucleotide reductase, while 5-fluorouracil (5-FU) suppresses thymidylate synthase. Both effectively deplete dNTP levels required for DNA and RNA synthesis. The genes encoding these targets have pleiotropic genetic interaction profiles and thus, do not localize to a distinct bioprocess cluster on the global genetic network. Nonetheless, triapine and 5-FU are mapped to the same predicted gene target NOP2, which is required for processing ribosomal RNA, a finding that presumably reflects the similar role of these compounds in RNA-based cytotoxicity [27].

The functional signatures of each compound library can be visualized by plotting the fraction of compounds in a library that are annotated to a particular biological process, compared to the fraction of gene profiles in the genetic interaction background set that are annotated to the same process. For the CNCL high-confidence predictions, bioprocesses related to vesicle trafficking, glycosylation, and the metabolism were overrepresented, while bioprocesses related to mitosis, chromatin organization, mRNA processing, and DNA repair were strikingly underrepresented compared to a background distribution based on yeast gene annotations (Supplementary Fig. 1). The distribution is similar to those of other large libraries we have previously screened using this method, such as the RIKEN Natural Product Depository and the NIH Structural Diversity Set [18]. In contrast, the approved drug library has a more significant representation in a number of these bioprocesses, which, in part, reflects an enrichment for drugs such as idarubicin and mocetinostat (MGCD0103), which are anti-cancer agents that target highly conserved bioprocesses, such as DNA repair, mitosis, and epigenetic mechanisms of gene regulation [25, 28] (Supplementary Fig. 1).

Correlating chemical-genetic profiles to predict compound mode-of-action

Chemical-genetic profiles can also be compared directly to one another in order to identify molecules with similar modes of action. This has been shown to work for known drugs with direct yeast targets, such as concanamycin A and cerulenin [29]. While this approach is powerful, it has not been explored extensively for drugs whose primary target is absent in yeast. Comparing the chemical-genetic interaction profiles of CNCL compounds that satisfied our high-confidence cut-off to those of the approved drug library (Fig. 1c), we found that 861 (78%) CNCL compounds in the high-confidence prediction set (1109 hits) shared a similar chemical-genetic interaction profile (Pearson correlation coefficient (PCC) > 0.2588, corresponding to the top 5% of all pairs of chemical-genetic profiles tested, Supplementary Table 4) with at least one of 66 (86%) of the approved drugs from the high-confidence prediction set (77 hits, Fig. 3). For more detailed analysis, we selected CNCL compounds that showed highly correlated chemical-genetic profiles with trifluoperazine [30] or wortmannin [31], which affect autophagy. In fact, 11 out of the 38 CNCL chemical-genetic profiles that passed the PCC threshold (>0.2588) were among the most similar to the autophagy compound profiles relative to possible tested pairs of chemical-genetic profiles (top ~0.4%, PCC > 0.5, Fig. 3). In addition, we also examined FH535, a compound that targets the Wnt/β-catenin signaling pathway [23], which is not found in yeast, and selected 18 CNCL compounds with chemical-genetic profiles that exhibited strong overlap with the Wnt compound profile (top ~4% of overall distribution, PCC ranging between 0.28 and 0.42, Fig. 3).

Fig. 3
figure 3

Distribution of profile correlation values between 1109 HCS compounds and 77 approved drugs. a Distribution of all correlation values between 1109 high-confidence compounds and 77 high-confidence approved drugs based on chemical-genetic profile similarity. The dashed line delineates the top 5% cut-off for correlation values (PCC > 0.2588) in the distribution. The Pearson correlation values of compounds that were selected for the autophagy validation are marked as red bars, while blue bars mark the correlation values of compounds selected for Wnt pathway validation. b Top: The Pearson correlation values between Wnt pathway-linked compounds and FH535, a known Wnt pathway inhibitor. Compounds with positive validation results, either from a luciferase assay to assess inhibition of gene expression from the TCF/LEF1 promoter, or from a Western blot assay to assess β-catenin expression levels, are marked. Treatment with 3305-G4, a novel natural product, inhibited TCF/LEF1 promoter activity and also led to reduced β-catenin expression. Bottom: The Pearson correlation values between autophagy pathway-linked compounds and either trifluoperazine or wortmannin, two compounds which are known to activate or inhibit autophagy, respectively. Compounds with positive validation results, either by microscopy-based assays to measure GFP-LC3 puncta and observe autophagic vacuoles, or a Western blot analysis to determine LC3-II levels, are marked. CD0345-E003, CD0354-A010, and WNN0090-F002 were bioactive hits in all assays

Pathway validation

Autophagy

Autophagy generally refers to macroautophagy, an evolutionarily conserved cellular process with a key role in maintaining intracellular homeostasis, and is induced by stress conditions such as nutrient starvation or drug treatment [32, 33]. The initiation of autophagy includes phagophore formation, followed by expansion into an autophagosome and fusion between autophagosomes with lysosomes, causing degradation of intracellular components [33]. In total, 38 CNCL compounds with profiles resembling that of either trifluoperazine or wortmannin were selected for further validation (Fig. 4 and Supplementary Fig. 3). LC3 (microtubule-associated protein 1 light chain 3) is a widely used autophagy marker; the number of GFP-LC3 puncta increases either when autophagy is induced or when autolysosome fusion is blocked [33]. We found that 37% (14 of 38) of compounds predicted to target autophagy exhibited increased GFP-LC3 puncta at similar levels to rapamycin, the canonical inhibitor of mTORC and an activator of autophagy (Fig. 4a, b) [33, 34]. Since autophagy is a dynamic process, observing autophagic flux is required to confirm the action exerted by the compounds in question. During autophagy induction, cytoplasmic LC3-I is converted into LC3-II by conjugation with phosphatidylserine, and this initiates the formation and lengthening of autophagosomes [32]. As a result, LC3-II levels increase with more autophagosome formation [32, 33]. We examined LC3-II protein levels, focusing on three compounds (CD0345-E003, CD0354-A010, and WNN0090-F002, Fig. 4c, d) as representatives, and found that all three increased LC3-II in a time-dependent manner. We then tested the remaining 11 compounds with a 24-h treatment scheme, and the results showed that they all significantly increased the LC3-II protein levels (Fig. 4e), indicating that all 14 compounds were capable of modulating autophagy. Furthermore, we observed autophagic vacuoles [34], including autophagosomes (or initial autophagic vacuoles, AVi) and degradative autophagic vacuoles (Avd), in human HEK293 cells treated with CD0345-E003, CD0354-A010 and WNN0090-F002 using transmission electron microscopy (Supplementary Fig. 4).

Fig. 4
figure 4

Functional validation of high-confidence predicted compounds in autophagy pathway. a HEK293 cells stably expressing GFP-LC3 were treated with DMSO, rapamycin (200 nM) and compounds (10 μM; 5 μM for WNN1716-H005 due to cytotoxicity) from the predicted set, respectively. GFP-LC3 puncta formation was monitored with confocal microscopy and representative fluorescent images of cells counterstained with Hoechst (nuclei) are shown. Scale bar, 10 μm. In the high-confidence predicted set, CD0345-E003, CD0354-A010, and WNN0090-F002 are representative active compounds which induced GFP-LC3 puncta formation, while CD0274-E005 is an inert compound. b Comparison of the autophagy-related phenotype induced by 38 predicted compounds based on the quantification of GFP-LC3 puncta. Green fluorescent spots represent autophagosomes (n > 50 cells per sample). Data shown are means ± SEM of three independent experiments. c HEK293 cells were treated with CD0345-E003, CD0354-A010 and WNN0090-F002 (10 μM) for 6, 12, 24, and 48 h. LC3-II and β-actin levels were assessed by Western blotting. Densitometric values were quantified and normalized to the control which was set to 1.0. Data shown are means ± SEM of three independent experiments. d Chemical structures of rapamycin, CD0345-E003, CD0354-A010, and WNN0090-F002. e HEK293 cells were treated with 11 other predicted compounds (10 μM; 5 μM for WNN1716-H005 due to cytotoxicity) for 24 h, LC3-II and β-actin levels were assessed by Western blotting. Densitometric values were quantified and normalized to the control which was set to 1.0. Data shown are means ± SEM of three independent experiments. f Comparison of the autophagy-related phenotype induced by 26 random compounds based on the quantification of GFP-LC3 puncta (n > 50 cells per sample). Data shown are means ± SEM of three independent experiments. Statistical significance was determined with a two-tailed Student’s t-test; *P< 0.01 and **P< 0.001

As a negative control, we selected compounds with high-confidence bioprocess target predictions that did not overlap with autophagy-related pathways, excluding compounds with structures that resemble those predicted to modulate autophagy. None of 26 control compounds induced GFP-LC3 puncta formation (Fig. 4f and Supplementary Fig. 5). Furthermore, the mean PCC value for the chemical-genetic profiles of the 14 compounds with autophagic activity was 0.568, which was higher than that of complete set of 38 predicted compounds (0.456), suggesting that chemical-genetic profile similarity has a positive impact on experimental validation: for the autophagy bioprocess, a compound with a higher correlation score is more likely to be validated biologically.

Mapping the validated compounds linked to autophagy onto the CNCL chemical-genetic interaction network, we observed compounds with distinct validation phenotypes segregating into two groups. Two compounds (WNN0090-F002 and CD0345-E003) that had positive LC3 fluorescence microscopy, Western blot and transmission electron microscopy results mapped to the DNA replication/repair and mitosis/chromosome segregation bioprocesses in the network (Fig. 2b). Recently, it has been shown that agents causing different types of DNA damage can lead to targeted autophagy by activating the yeast Mec1/ATR, Tel1/ATM, and Rad53/CHEK2 kinase cascade [35]. However, the majority of the compounds (CD0295-C003, CD0296-D002, CD0296-H004, CD0296-H007, CD0338-F007, CD0447-G005, WNN0040-A007, and WNN0159-G002), validated by fluorescence microscopy and Western blotting, were linked to genes in the vesicular traffic bioprocess (Fig. 2b), which plays a major role in numerous aspects of autophagy [36]. The bioprocess predictions for these compounds included “ATP hydrolysis coupled proton transport” and “vacuolar acidification”, relating to the lysosomal H+ ATPase, whose activity is required for proper autophagic flux [36]. Taken together, these data show that our yeast chemical-genetic profiling platform provides a powerful approach for predicting compounds that modulate autophagy in human cells.

Wnt/β-catenin pathway

We also examined the effect of compounds that had profile similarities with FH535, a known inhibitor of the Wnt/β-catenin pathway, in human cells [23]. FH535 suppresses β-catenin/TCF-mediated transcription and exhibits anti-proliferative effects in various cancer cell lines. Although this pathway is not present in yeast [37], we found that chemical-genetic profiles associated with 18 CNCL compounds were significantly correlated with the chemical-genetic profile of FH535 (Fig. 3). Wnt/β-catenin signaling is an important pathway that influences cell differentiation, proliferation and fate [37, 38]. β-catenin activity in this pathway can be grouped into two processes: accumulation and nuclear transposition. The former is linked with phosphorylation, while the latter results in the formation of the β-catenin-TCF/LEF1 complex, which activates transcription of downstream genes. Based on this, we conducted a reporter gene assay to investigate the effects of predicted compounds on β-catenin downstream transcription controlled by β-catenin-TCF/LEF1 complex [39], and Western blot analysis to determine compound effects on β-catenin accumulation and phosphorylation mediated by glycogen synthase kinase (GSK) [40].

For the first assay, we used cells with the TCF/LEF1 promoter element linked to the luciferase reporter gene, and measured luciferase activity after compound treatment. Like FH535, which decreased luciferase activity by 72.08% after 24-h treatment at 20 μM, luminescence intensity was reduced by CD0450-A007 (39.61%), 3215-1 A11 (61.64%), and 3305-2 G4 (53.08%), respectively (Fig. 5a), suggesting that these compounds inhibited transcription from Wnt/β-catenin signaling responsive elements (TCF/LEF1). We also studied the effects of these compounds on the expression level of β-catenin. While there was no significant difference between CD0450-A007 and 3215-1 A11 and the DMSO control, 3305-2 G4 treatment clearly decreased β-catenin expression, consistent with reduced luciferase activity in the TCF/LEF1 reporter assay (Fig. 5b). Dose-response experiments for luciferase activity demonstrated that 3305-2 G4 and FH535 had a similar IC50 (7.44 μM vs. 7.43 μM, Fig. 5c), a finding that was further supported by Western blot analysis (Fig. 5d). Thus, both FH535 and 3305-2 G4 altered the expression of active β-catenin. For negative controls, the effects of 24 random compounds were examined similarly, and while two compounds displayed FH535-like inhibition in the luciferase assay, namely CD0203-G006 (68.69%) and CD0251-G005 (71.75%, Fig. 5e), none of the random compounds exhibited any impact on β-catenin expression (Fig. 5f).

Fig. 5
figure 5

Functional validation of high-confidence predicted compounds in Wnt/β-catenin signaling pathway. a Three out of eighteen compounds induced luminescence signal changes in HepG2 cells transiently transfected with a luciferase reporter gene under the control of TCF/LEF1 response element. Luciferase activity was determined after compound exposure for 24 h. Data shown are means ± SEM of four independent experiments. b 3305-2 G4 and FH535 decreased β-catenin expression in HepG2 cells. Quantitative densitometric analysis of Western blotting from three independent experiments and the data shown are means ± SEM. c Dose-response curves of luminescence signal induced by 3305-2 G4 and FH535. Data shown are means ± SEM of three independent experiments. d Compound 3305-2 G4 decreased β-catenin expression in a dose-dependent manner in HepG2 cells. Quantitative densitometric analysis of Western blotting from three independent experiments and data shown are means ± SEM. e Two out of twenty-four random set compounds induced luminescence signal changes in HepG2 cells. Data shown are means ± SEM of three independent experiments. f None of the compounds decreased β-catenin expression in HepG2 cells. Quantitative densitometric analysis of Western blotting from five independent experiments and data shown are means ± SEM. Statistical significance was determined with a two-tailed Student’s t-test; *P< 0.01 and **P< 0.001

3305-2 G4 is a known alkaloid called drupacine [41] (Fig. 5c), isolated from Cephalotaxus fortunei, whose structure was identified by spectroscopic data analysis [42]. 3305-2 G4 mapped to “mRNA processing” (Fig. 2b), while FH535 mapped to “glycosylation, protein folding and targeting, cell wall biogenesis” bioprocesses (Fig. 2b). However, a closer examination of high-confidence target process predictions for these compounds showed that FH535 and 3305-2 G4 share a common process prediction, “Response to acidic pH” (Supplementary Table 5). The top driver genes for this target prediction are BCK1 and SLT2, which encode MAPKKK and MAPK of the protein kinase C signaling pathway in yeast that controls cell wall integrity [43]. Though the Wnt pathway, the human target for FH535, is not present in yeast, it is possible that FH535 and 3305-2 G4 both act on a conserved pathway that is relevant to Wnt signaling mechanism.

To further assess the shared targets of FH535 and 3305-2 G4, a “pseudo-compound” chemical-genetic profile was calculated as a weighted mean profile of the two compounds, which accentuates large chemical-genetic interaction scores that are common between FH535 and 3305-2 G4. The mean chemical-genetic profile of FH535 and 3305-2 G4 was weighted by an element-wise product between the two compounds. Direct targets were predicted for this pseudo-compound profile, using the same process applied to calculate direct targets for real compounds. Among the top gene targets calculated for the merged FH535/3305-2 G4 chemical-genetic profile were CDC73 (top 5th out of 4625 ranked genes, Supplementary Table 6) and MRK1 [18] (top 6th out of 4625 ranked genes, Supplementary Table 6). CDC73 is a component of the PAF1 complex which binds to and modulates the activity of RNA polymerases I and II, and is required for gene expression, histone modification, and telomere maintenance [44]. Its human homolog, parafibromin, is a tumor suppressor linked to parathyroid, renal, and uterine cancers [45]. Parafibromin and other components of the PAF1 complex are required for the nuclear transduction of the Wnt signal, and bind directly to the C-terminal portion of β-catenin to control the activity of Wnt target genes [46]. MRK1 is one of four GSK-3 homologs in yeast, and activates Msn2-dependent transcription of stress responsive genes that function in protein degradataion [47]. GSK-3 is a key enzyme in Wnt signaling, and phosphorylates β-catenin leading to its subsequent degradation [48]. In spite of the Wnt pathway itself being absent in yeast, the merged chemical-genetic profiles of FH535 and 3305-2 G4 were capable of predicting direct gene targets which are highly conserved and are important players in Wnt signal transduction.

Discussion

Mode-of-action discovery for novel compounds is a well-recognized rate-limiting factor for rational drug development. Chemical-genetic profiling in yeast identifies target pathways in an unbiased manner and has proven to be amenable to HTS campaigns, though the utility of this method in finding compounds that impinge on metazoan pathways, especially those that are absent from yeast but found in higher organisms, remains relatively unexplored. The present analysis annotates a complex library of small molecules employing a diagnostic chemical genomic pipeline [18]. We used a comprehensive database of yeast genetics interactions [14] as a key for interpreting the chemical-genetic profiles and linking the predicted compounds to general target bioprocesses. Indeed, when 16 compounds predicted to target the tubulin cytoskeleton assembly pathway were evaluated using techniques such as cell cycle analysis, two of them were successfully validated to disrupt tubulin function in HeLa cells (Supplementary Fig. 2). Importantly, we also developed a dataset of chemical-genetic profiles for approved drugs with known modes of action, and applied it to identify new hits whose chemical-genetic profiles are highly similar to those of known drugs with well-established pharmacology in metazoans. Notably, we have shown that matching the chemical-genetic profiles of hits from HTS campaigns to those of known drugs allows the identification of compounds with targeted activity in mammalian cell pathways.

As proof of concept, several drugs with highly conserved cellular targets have chemical-genetic target gene and process predictions that are consistent with their known modes of action. Idarubicin, delanzomib, and mocetinostat act on diverse cellular targets (topoisomerase II, proteasome, and histone deacetylase, respectively), and these compounds were accurately linked to their target pathways and bioprocesses in yeast, namely, DNA replication and repair, proteasome/protein degradation, and mRNA processing (Fig. 2b).

Correlation analysis of chemical-genetic profiles, which compared approved drugs and CNCL compounds, was effective in linking novel compounds to autophagy, a highly conserved pathway from yeast to human, and the Wnt/β-catenin signaling pathway, which is not present in yeast. The autophagy-linked compounds generally had higher PCC values (~0.8) than that of the Wnt-linked compounds (~0.35). The difference in distribution of these values likely reflects the high degree of conservation between autophagy mediators in yeast and metazoans. Thus, this guilt-by-association analysis may be particularly powerful for finding compounds that work in essential, highly conserved biological pathways.

One distinct advantage of this approach is the ability to group compounds into functional categories that may reflect their mode-of-action in cells. A striking observation was the segregation of the autophagy-linked compounds into two regions of the chemical-genetic interaction network: the “vesicular traffic” bioprocess for a majority of the compounds that were only validated by puncta formation as viewed by fluorescence microscopy and a general measurement of LC3-II expression by Western blotting, and the “DNA replication and repair” and “nuclear-cytoplasmic transport” parts of the network for compounds that had positive validation results for the more rigorous assays to assess autophagic activity (transmission electron microscopy and a LC3-II expression time course). The functional segregation of these validated hits may reflect commonalities in mode-of-action that would not have been apparent from an assay that focuses on a single phenotypic outcome such as autophagic vesicle formation.

Interestingly, the autophagy-modulating compounds can all be described as cationic amphiphilic drugs (CADs) with common physiochemical properties. CADs have a hydrophobic ring structure and a hydrophilic side chain with a charged cationic amine group, and comprise a diverse array of various substance classes such as antiarrhythmics and antipsychotics [49]. It has been recently shown that high concentrations of CADs after long exposure times tend to localize at autophagosomal compartments [50]. Accumulation of the tested CADs into the autophagosome may, at least in part, explain the observed modulation of autophagy by these compounds.

We also identified compounds that act on the Wnt/β-catenin signaling pathway, which is completely absent in yeast, pointing to the possibility of using yeast chemical-genetic profile similarity as a way to link compounds to drugs that may not have conserved yeast targets. Notably, FH535 and 3305-2 G4 map to separate parts of the chemical-genetic interaction network: “Glycosylation, protein folding, and cell wall biogenesis” for the former and “mRNA processing” for the latter (Fig. 2). This suggests that these compounds have distinct functional consequences in yeast, though there is significant overlap in their chemical-genetic profiles, leading to a PCC value of ~0.35 (P = 1.05 × 10−8). Interestingly, by merging the chemical-genetic profiles of FH535 and 3305-2 G4 and weighting the interaction scores to emphasize the overlap between their profiles, we predicted direct target yeast genes (CDC73 and MRK1) whose human homologs are key players in Wnt signaling pathway which either bind to β-catenin directly to activate expression of target genes upon transduction of the signal into the nucleus (parafibromin, homolog of CDC73), or stimulate cytoplasmic degradation of β-catenin by phosphorylation (GSK-3, homolog of MRK1). Furthermore, 3305-2 G4 is a known natural compound (drupacine) that is capable of suppressing cancer cell growth in vitro [51]. Whether this effect is related to Wnt/β-catenin signaling pathway remains to be explored. Orthologous phenotypes have been previously used for systematic discovery of non-obvious human disease models [52], and our study provides a framework for identifying orthologous chemical-genetic gene targets in yeast for compounds that act on therapeutically relevant non-conserved pathways in humans.

In conclusion, our findings demonstrate that compounds with target bioprocess predictions identified in an unbiased yeast-based chemical-genetic system can be validated in mammalian cells. These target predictions can be derived directly for highly conserved cell functions, or they can be inferred from chemical-genetic profiles of therapeutics with known modes of action. This general approach shows great promise for utilizing yeast chemical-genetic profiling for identifying pathway-specific drug leads towards a wide variety of human diseases.