Transcriptome and network analyses in Saccharomyces cerevisiae reveal that amphotericin B and lactoferrin synergy disrupt metal homeostasis and stress response

Invasive fungal infections are difficult to treat. The few available antifungal drugs have problems with toxicity or efficacy, and resistance is increasing. To overcome these challenges, existing therapies may be enhanced by synergistic combination with another agent. Previously, we found amphotericin B (AMB) and the iron chelator, lactoferrin (LF), were synergistic against a range of different fungal pathogens. This study investigates the mechanism of AMB-LF synergy, using RNA-seq and network analyses. AMB treatment resulted in increased expression of genes involved in iron homeostasis and ATP synthesis. Unexpectedly, AMB-LF treatment did not lead to increased expression of iron and zinc homeostasis genes. However, genes involved in adaptive response to zinc deficiency and oxidative stress had decreased expression. The clustering of co-expressed genes and network analysis revealed that many iron and zinc homeostasis genes are targets of transcription factors Aft1p and Zap1p. The aft1Δ and zap1Δ mutants were hypersensitive to AMB and H2O2, suggesting they are key regulators of the drug response. Mechanistically, AMB-LF synergy could involve AMB affecting the integrity of the cell wall and membrane, permitting LF to disrupt intracellular processes. We suggest that Zap1p- and Aft1p-binding molecules could be combined with existing antifungals to serve as synergistic treatments.

manufacturer's instructions, with 2 μg of total RNA for each of the three biological triplicate samples as input. The libraries were enriched using 15 cycles of PCR and the insert size ranged from 80 to 330 base pair (bp). The libraries were randomized and sequenced in two separate lanes using HiSeq 2000 with the TruSeq v3 SBS reagents to generate 100 bp paired-end reads. The data were demultiplexed using Casava 1.8.2.
For LF treatment and the corresponding matching controls, a TruSeq Stranded mRNA-seq prep kit was used to prepare the RNA samples as per protocol using 1 µg of total RNA for each of the three biological triplicate samples as input. The libraries were enriched using 12 cycles of PCR to minimize the number of PCR duplicates. The libraries were multiplexed and sequenced on the same flow cell using NextSeq500 v2 reagents to generate 75 bp single reads. The data were demultiplexed using Casava 1.8.2.

Processing of RNA-seq data
RNA-seq data were analysed as per Twine et al. (2013) 1 . Briefly, sequencing reads for all samples were assessed for quality using FastQC (version 0.10.1; www.bioinformatics.babraham.ac.uk/projects/fastqc).
All samples had an average Phred score of 30 or greater. The SolexaQA (version 2.2) was used to trim the paired-end reads using p-value < 0.05 as a cutoff and selecting high quality paired-end reads with length of at least 25 nucleotides 2 . A median of 93.26% reads across all samples were high quality pairedend reads. Filtered reads were mapped against the S. cerevisiae genome reference (R64-1-1 build) 3 using Bowtie (version 2-2.0.0-beta7) 4 and TopHat (version 2.0.4) 5 with options "-N 3 -library-type frunstranded". Aligned reads with 3 or more mismatches were discarded, and reads were aligned from the left end of the fragment to the transcript strand and right end of the fragment to the opposite strand. All high-quality paired-end reads were mapped to the S. cerevisiae genome. The number of reads mapped to each gene was counted using HTSeq (version 0.5.3p9) using the options "-s no -m union -t CDS -i gene_id". Reads were mapped to the coding sequence (CDS) irrespective of the strand and reads found in ambiguous overlapping CDS were not counted. The GFF gene co-ordinates file was obtained from the Saccharomyces Genome Database 6 and converted to GTF format using custom Perl scripts.
Different versions of RNA-analysis tools were used for the LF treatments and matched controls. These include Bowtie (version 2.2.4), TopHat (version 2.0.13) with options "-N 2 --library-type fr-unstranded" and Htseq (version 0.6.1p1) with the same options as described above.

Analysis of Differential Gene Expression
To identify the genes that were differentially expressed in S. cerevisiae treated with AMB, LF or AMB-LF, biological triplicate RNA-seq samples were analyzed for each treatment, with 18 samples in total. The data generated from HiSeq2000 and NextSeq platforms were analyzed separately. For the samples analyzed with HiSeq2000, there were on average 10 million reads per sample (approx. 170x coverage) and for the samples analyzed with NextSeq, there were on average 27 million reads per sample (approx. 170x coverage). Genes with low raw read counts were removed and only genes with 7 counts per million for at least 6 samples were kept for analysis. Differential expression of genes was analyzed using in R (version 3.2.5) and the edgeR library (version 3.12.0) 7 . Multiple dimension scaling was used to examine the similarity in read counts between the samples and to assess reproducibility of the replicates. The samples were normalized with respect to the library size and tag-wise biological coefficient of variation.
A linear model was used to adjust for lanes effect from sequencing the samples on different sequencing lanes. Statistically significant differentially expressed genes were identified using edgeR 7 . An adjusted pvalue of 0.05 was used as the cutoff for identifying differentially expressed genes. The log fold-change in gene expression that was calculated using edgeR was compared with those calculated using DESeq2 8 , with high reproducibility between the two methods (Supplementary Figure S2). The results generated using edgeR were used for all subsequent analyses as these identified a lower number of differentially expressed genes and were therefore more stringent.
For the analysis of the LF treatment, RNA-seq data were analyzed with R (version 3.2.5) and edgeR (version 3.12.0) 7 as above. Genes with low raw read counts were removed and only genes with 7 counts per million for at least 3 samples were kept for analysis. The samples were normalized with respect to the library size using the Trimmed Mean of M-values (TMM) method and the biological coefficient of variation was estimated using the tag-wise method 7 . The reproducibility of the replicates was analyzed using multidimensional scaling (Supplementary Figure S4a). Any batch effects from the experiments carried out on the different lanes were eliminated using a generalized linear model. An adjusted p-value of 0.05 was used as the cutoff for identifying differentially expressed genes. The log fold-change in gene expression calculated using edgeR. The edgeR results were compared with those obtained through DESeq2 (data not shown). No significantly differentially expressed genes were identified in the LF treatment and therefore this treatment was not subject to further analysis.

Self-Organizing Maps
Self-organizing map (SOM) analysis was used to identify clusters of differentially expressed genes with similar gene expression profiles. Transcript data from the AMB and AMB-LF treatments were first organized in two separate self-organizing maps (SOMs) to identify clusters of genes with correlated expression patterns. Several normalization steps were performed on the read counts for each sample, these included: 1) normalization with respect to the library size using the 'cpm' function in EdgeR, 2) normalization with respect to gene length to obtain the reads per kilobase per million (rpkm) value, 3) the log 2 rpkm values across the 12 samples were scaled to a mean of 0 and standard deviation of 1 per gene using Pareto scaling implemented in the 'genefilter' library (version 1.50.0).
The SOM was generated using the R 'kohonen' library (version 2.0.19) 9 . A 5 x 5 grid, the Von Neumann neighborhood function and 100 iterations of the learning algorithm were applied to each SOM. Each cluster contained genes with highly similar gene expression profiles, with neighbouring clusters having less similarity. The 5 x 5 grid was selected over other sizes (4 x 4 or 6 x 6) to identify clusters that were not too sparse or dense (data not shown).
Each self-organizing map figure was drawn using custom R scripts and the 'lattice' library (version 0.20-33). The gene expression profile for each gene consisted of the scaled values (y-axis) for the 12 samples (x-axis). The clusters were numbered from bottom to top and from left to right in the self-organizing map, such that the bottom-left cluster is number 1 and the top-right cluster is number 25. To provide a summary view of the SOM map, a heat map was used to present the average expression pattern for each cluster. The Euclidian distance function was used to assess the similarity in the gene expression profile and the 'average' agglomeration method was used to group the clusters.

Functional Enrichment Analysis
Lists of significantly differentially expressed genes from the AMB and AMB-LF treatments were separately analyzed for enrichment of Gene Ontology (GO) biological process terms using the ClueGO 10 tool. To identify active or inactive pathways, genes with increased or decreased gene expression were analyzed separately. The Bonferroni step down method was used to adjust the p-values, and GO terms with adjusted p-value < 0.05 were deemed significant. GO terms that shared a high number of genes were merged together, if the Cohen's kappa score was below 0.13, and the most significant GO term was considered representative and was shown for this analysis. The bar chart that plots the negative log p-values of each enriched GO term was drawn using Tableau visual analytic software (version 9.0).
Each cluster of co-expressed genes from the SOMs was also analyzed for enriched GO terms using the R statistical analysis package and the following libraries: 'GO.db' (version 3.2.2), 'org.Sc.sgd.db' (version 3.2.3), 'GOstats' (version 2.36.0) 11 and 'multtest' (version 2.26.0) 12 . Any GO term that mapped to one gene only was excluded from the analysis. To account for the hierarchical relationships among GO terms, a conditional hypergeometric tests was used to adjust the significance of a parent GO term based on the significance of the child GO terms. A GO term was considered to be significant if the adjusted pvalue was less than 0.01.

Comparison of enriched GO terms identified prior to and after gene expression profile clustering using SOM
We compared the number of enriched GO biological process terms that were identified either before or after the application of SOM to cluster genes with similar gene expression profiles (Supplementary Figure S7). We also tallied the number of enriched GO terms that were in common, before and after clustering. The analyses described above were performed separately on lists of differentially expression genes from AMB or AMB-LF treatments, and for each treatment the genes with increased or decreased expression were also analyzed separately. We also determined if relationships between GO terms in the GO hierarchical tree affected the number of unique GO terms identified before or after clustering with SOM. Relationships among the GO terms were obtained from http://www.geneontology.org/ontology/go.obo). To compare GO terms identified before and after clustering with SOM, two GO terms were counted as non-unique if they shared ancestor-descendant relationships in the GO hierarchical tree. High-level GO terms with a depth of less than 2 were excluded from analyses of the ancestor-descendant relationships.

Network-based Enrichment of Transcription Factor Targets in Co-expressed Gene Clusters
Network-based enrichment of transcription factor gene target relations from the Yeastract 13 and YeastMine 14 databases were used to investigate whether co-expressed genes in the same SOM cluster were likely to be co-regulated by a common transcription factor 13,14 . A list of 207 known S. cerevisiae transcription factors was obtained from the literature 15,16 . To check whether these transcription factors were responsible for the regulation of gene co-expression, each cluster was analyzed for an enrichment of targets for any of the known transcription factors. The result consisted of a list of transcription factors that have high probability of regulating the expression of the known target genes in the cluster. To increase the specificity of the transcription factors identified, two different approaches were used and the results that were in common between the two approaches were reported. The first approach involved the use of Yeastract 13 (version 2013-09-27), a database of curated transcription factor and target gene interactions. Fisher's exact test was used to calculate the probability of enrichment and the p-values were adjusted for false discovery rate. A transcription factor was considered to be enriched if the adjusted p-value was < 0.0001. For each cluster, any transcription factors with one or more target genes in the cluster were analyzed. The second approach used the YeastMine publication enrichment tool 14 . Each cluster of co-expressed genes was analyzed separately for an over-representation of known transcription factor and target genes relationships from the literature. Significant hits with Benjamini-Hochberg adjusted p-value of < 0.05 were filtered by searching for literature with a title containing the gene name of known transcription factors. For the two approaches, the background set included the total set of yeast genes in the SGD database.

Networks
To investigate transcription factor-target interactions and the protein-protein interactions between transcription factors and co-regulatory factors, an integrated network was constructed. This network was created from the union of the Yeastract 13 transcriptional regulatory network (version 2013-09-27). and the protein interaction network from Pang et al. (2012) 17 . Genes involved in iron uptake were as curated by Philpott and Protchenko (2008) 18 and genes involved in zinc homeostasis were as curated by Wu et al. (2008) 19 and Eide (2009) 20 . The integrated networks were visualized using Cytoscape version 3.1.1 21 . Log fold-changes from differential expression analysis were co-visualized as node colors.

Apoptosis and oxidative stress response genes
Genes involved in oxidative stress were identified from the Saccharomyces Genome Database (SGD) 6 23 were also added to our list.

Selection of genes for chemical-genetic experiments
A list of criteria was used to select query genes to perform chemical-genetic experiments with gene knockout mutants. The query genes were selected from the list of genes with significant differential expression in AMB or AMB-LF treatments. We narrowed the list further to genes that were representative of enriched biological processes involved in response to stress or drug treatments or were targets of transcription factors Aft1p and Zap1, as annotated in Yeastract 13 . Additional information was included to identify putative drug targets. Using OrthoMCL 24 , we checked whether the query S.
cerevisiae genes had orthologs in four fungal pathogens (C. neoformans, C. albicans, C. glabrata and A. fumigatus) and in humans. We also used reciprocal BLAST 25 searches to compare the proteomes of C. neoformans and S. cerevisiae, and reciprocal best hits are also identified as orthologs. We also checked whether deletion of the query gene was known to increase sensitivity to AMB (148 nM or 100 μM) in a large-scale chemical-genetic screen 26 , and whether the S. cerevisiae proteins of interest shared common protein domains (Pfam version 27.0) 27 with 'druggable' proteins from the DrugBank database (version 4.1) 28 .

Saccharomyces cerevisiae haploid single gene knockout mutants
Knockout mutants were obtained from the Yeast Deletion Project 29 for the S. cerevisiae MATa BY4741 strain. Knockouts were confirmed using PCR as previously described by the Yeast Deletion Project KanC-D, amplifications from primers A-D sufficed. Additionally, primers A-B were used to amplify DNA sequences from the wild type strain BY4741 to confirm gene identity 30 .

Saccharomyces cerevisiae gene knockout design (YOR387C)
Gene knockouts were performed using a modified protocol from Janke et. al. (2004) 31 . Plasmid cassettes pFA6a-natNT2 and pFA6a-hphNT1 were used to generate a yor387cΔ knockout mutant with BY4741 and a yor387cΔ/vel1Δ double knockout from the vel1Δ knockout mutant. Primer sequences 45-55 bp before and after and inclusive of the start and stop codon of YOR387C were designed and tagged with a common sequence homologous to the marker cassette within the plasmid at the 3' end. The primers designed were: S1 -5' ttctgatagattgtacaatctcaagaaatcaagaacaacaaccataccatgcgtacgctgcaggtcgac  gene deletion mutants a) aft1Δ, atg1Δ, cch1Δ, tpk2Δ, vel1Δ, yap5Δ, yct1Δ, yor387cΔ,  vel1Δ/yor387cΔ and zap1Δ, b) aft1Δ, met32Δ, mtd1Δ, were compared to the S. cerevisiae wild type (BY4741) background. Ten-fold serial dilutions of the strains were plated on synthetic complete agar plates from left to right starting at 10 6 cells/mL with different types of stressors. 'D' indicates the days of growth at 30 °C. Stressors tested include antifungal drugs (amphotericin B (AMB), fluconazole (FLC)), oxidative stress (H 2 O 2 ), nitrosative stress (NaNO 2 ), heat stress (37 °C) and cell wall stressing agents (NaCl, SDS, calcofluor white (CW) and caffeine). CW was tested at 200 µg/mL except for vel1, yor387c and vel1/yor387c where 300 µg/mL CW was used. In (a) cch1 was tested later with only selected stressing agents. Only aft1Δ and zap1Δ mutants showed increased sensitivity to AMB and other types of stressors. These suggest Aft1p and Zap1p are important for resistance to AMB and oxidative stress. The aft1Δ strain had increased susceptibility to calcofluor white, NaCl, NaNO 2 , SDS and caffeine, and the zap1Δ strain was susceptible to all stressors except SDS, NaNO 2 and NaCl. These results indicate Aft1p and Zap1p are important in stress responses in general. c) The aft1Δ and zap1Δ mutants were further tested with AMB in combination with LF. Both aft1Δ and zap1Δ showed growth retardation in the presence of AMB-LF but not to the individual agents alone.
Supplementary Figure S7: The numbers of enriched GO biological processes terms identified prior to and after SOM clustering. Results were divided into groups based on the type of treatment, increased or decreased gene expression and whether clustering had been applied prior to GO term enrichment (graph on the left hand side). The horizontal bars show the number of enriched GO terms identified prior to clustering analysis (green bar), after clustering analysis (orange bar) or shared by both analyses (blue bar). To demonstrate that the increased percentage is due to identification of GO terms unique to the clustering analysis, we reanalyzed the data and GO terms were considered to be unique only if related ancestor-descendent relationships were not identified prior to clustering (graph on right hand side). We consistently observed a higher number of enriched GO terms after clustering analysis. Yeastmine adjusted pvalue c Cluster ID 10 a) Descriptions from the Saccharomyces Genome Database b) Transcriptional regulatory network from the Yeastract database (adj. p-value < 0.01) c) Transcription factor and target genes relationships from the Yeastmine database (adj. p-value < 0.05)  Supplementary Data S1: Significant differentially expressed genes for the AMB and AMB-LF treatments.

Supplementary
 Supplementary Data S2: Significant differentially expressed genes grouped into clusters using Self-Organizing Maps.
 Supplementary Data S3: Functional enrichment results for significant up-or down-regulated genes from AMB and AMB-LF treatments.
 Supplementary Data S4: Functional enrichment results for each cluster of genes from the Self-Organizing Maps.