Introduction

Production of advanced biofuels including cellulosic ethanol has attracted growing interest with increased concerns over a sustainable energy supply and a cleaner environment1,2. Development of the next-generation biocatalyst with robust and tolerance characteristics is a necessity and among the significant challenges for cost-efficient bio-based economy. The traditional industrial ethanologenic yeast Saccharomyces cerevisiae is a candidate for the next-generation biocatalyst development due to its high ethanol titer and robust performance in ethanol conversion. Numerous efforts have been made to improve S. cerevisiae for advanced biofuels applications, however, results obtained from laboratory strains and industrial yeast strains are not always consistent. For example, lab model strains of S. cerevisiae typically show transient expression responses to various environmental stimuli3,4 while industrial yeast strains, on the other hand, displayed more persistent expressions against biomass inhibitory compounds5,6. Using a lab strain as a host, poor or negative function of heterologous xylose transporter genes was observed7,8. In contrast, when an industrial strain was used as a host, several heterologous xylose transporters were found to exhibit strong expressions and significant functional improvement of xylose uptake and utilization9. Speculations behind these differences are many and most of these remain unresolved. The lab strains equipped with many genetic markers for phenotype selections are convenient, however, such strains often have mutated gene copies and allelic variations (http://www.yeastgenome.org/) that may not necessarily reflect response or performance of an industrial yeast10. For ethanologenic yeast, many genes and metabolic pathways were found to be responsible for varied stress response11,12. Sequence rearrangements and copy number changes of adaptive genes are likely related to unique metabolic pathways of industrial yeasts13,14. However, knowledge about genome background of industrial yeast especially the genome structure in relationship to stress tolerance, including inhibitor tolerance, remains largely unknown. Using known mechanisms of well-characterized model yeast to interpret industrial yeast performance often is false or misleading. The lack of clearly defined knowledge on genome origin of different yeast strains hinders development of the next-generation biocatalysts for advanced biofuels production.

In this study, using an industrial yeast strain NRRL Y-12632 as an example, we investigated its genome sequence variations and transcriptome response in comparison with a model strain S288C against 5-hydroxymethyl-2-furaldehyde (HMF), a representative toxic compound derived from lignocellulose biomass pretreatment. Strain Y-12632 was selected based on its more tolerant phenotypes and response to inhibitory compounds from 16 industrial yeast strains evaluated15. It is a commonly recognized industrial type strain also known as CBS1171 and was originally isolated from the brewer's top yeast in Netherlands in 1925 and deposited at the Centralbureauvoorvoor Schimmelcultures (CBS), Utrecht, The Netherlands16. As a type strain, it has been collected worldwide and is also known as ATCC 18824, AWRI74, CCRC 21447, DBVPG 6173, DSM 70449, IFO 10217, IGC 4455, JCM 7255 and NCYC 50517. Novel aldehyde reductase genes and candidate tolerant genes to biomass fermentation inhibitors were identified from strain NRRL Y-1263218,19. Key regulatory elements in genomic adaptation of strain Y-12632 to HMF were reported5. Using an integrated approach of genome and transcriptome analyses, we identified extensive genome-wide sequence variations of the industrial yeast Y-12632 when compared with the model strain S288C. Our transcriptome analysis of Y-12632 revealed the first insight into significant roles of tolerant signaling pathways in yeast against biomass pretreatment inhibitory compounds. The unique genomic background of strain Y-12632, the high levels of its genome sequence variations and the signature expressions uncovered by this study provide new knowledge to understand tolerance of industrial yeast. Results of this study aid development of the next-generation biocatalysts for advanced biofuels production.

Results and Discussion

Genome sequence analysis of industrial type strain NRRL Y-12632

The Y-12632 genome was sequenced using Illumina technology GA-IIx. High quality sequencing data of 23 million reads were obtained by stringent trimming with 160-fold coverage of the genome. The de novo assembly was performed by Velvet 1.0.1420, producing 238 contigs, with N50 of 94.2 kbp and a total length of 11.6 Mbp (Table S1). Based on the assembly, 5,380 polypeptide-encoding Open Reading Frames (ORFs) were predicted with gene model of S288C as a reference. Through protein sequence alignment analysis, we found 98.6% and 98.3% of these ORFs showing high degrees of homology to entries in non-redundant protein (NR, http://www.ncbi.nlm.nih.gov/) database and Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.kegg.jp/) database, respectively. A total of 3,605 protein encoding genes were annotated by InterProScan21. We observed an ORF for gene FPG1 which is not present in the S288C genome. FPG1 was recently reported to encode a mannoprotein precursor promoting foam in wine strains22. A few other ORFs not present in S288C were found encoding hypothetic proteins or unknown functions. These sequences were also commonly observed in other wine or beer yeast strains such as EC111823. Based on the genome sequences (SGD, http://www.yeastgenome.org/) of eight yeast strains from a variety of sources and application scopes, including laboratory, bakery, wine, industrial bioethanol, sake and natural isolates, as well as that of Y-12632, 942 single-copy orthologous gene-sets were identified and a phylogenetic tree was generated accordingly (Fig. S1). It is known that significant genetic variability exists among a large number of S. cerevisiae strains from wild type, laboratory and industrial sources. In general, cluster of yeast strains based on sequence variations was more closely related to the area of technological applications rather than geographical origin24. Consistent with this observation, our phylogenetic analysis showed that Y-12632 was not closely related to the laboratory model, wine and other types of strains (Fig. S1).

Using the S288C genome as a reference, we identified a total of 32,811 single nucleotide polymorpharisms (SNPs) in Y-12632 with stringent filtration of high sequencing depth (>70) and quality value (>2100) (Supplementary Dataset 1; Fig. S2). PCR sequencing verifications of 69 selective SNPs (from 79 loci) confirmed 87% accuracy of the global map of SNPs (Methods). The SNPs in Y-12632 were observed to be distributed widely across 16 chromosomes and the mitochondrion genome of the yeast (Supplementary Dataset 1). Approximate 16,000 detected SNPs in Y-12632 were found in exonic regions and 6,691 SNPs located at intergenic regions (Fig. 1A). More than 8,000 mutations were detected to reside within 5-kb upstream of the coding regions, which potentially affect protein binding motifs that interfere with gene transcription regulatory functions. Approximately 70% of exonic SNPs were non-synonymous mutations involved in 3,740 genes. Genes harboring these variations were found significantly enriched under molecular functions and categories of sequencing-specific DNA binding (P = 1.24e-09), transcription regulator activity (P = 3.23e-06), adenyl nucleotide binding (P = 5.00e-04) and protein serine/threonine kinase activity (P = 2.00e-04) (Fig. 1B). A summary of all SNPs identified in Y-12636 with overlapping protein domain and GO terms are presented (Supplementary Dataset 1, Table S2). More than 700 SNPs showed variations involved in stop codons, which accounted for approximately 4% of the total SNPs (Fig. 1A).

Figure 1
figure 1

Genome sequence analysis of industrial yeast Saccharomyces cerevisiae type strain NRRL Y-12632.

(A) Distribution of single nucleotide variations in the Y-12632 genome. (B) Functional categories of genes containing SNPs in Y-12632 genome with significant higher levels of variations marked against reference genome S288C.

Sequence variation impacts global metabolism pathways

At the genomic level, over 23,000 missense and nonsense SNPs were associated with protein-coding genes from 94 KEGG pathways (Fig. 2). To probe the functional implication of these SNPs, we analyzed a set of time-course data on global gene expression in response to HMF for Y-12632. Differential gene expression during the lag phase is considered as an adaptive response of the yeast to the toxic compounds liberated by the biomass pretreatment5,6. Using Significance Analysis of Microarrays (SAM)25 with a time-course mode, we identified a total of 2,736 genes that were up-regulated upon HMF challenges (Table S3). These genes were involved in 90 KEGG pathways, among which, 60 metabolic pathways were found containing both enhanced expressed genes and genes with non-synonymous SNPs involving more than 1,700 genes (Fig. 2; Table S4).

Figure 2
figure 2

The global metabolic pathways of industrial yeast Saccharomyces cerevisiae type strain NRRL Y-12632.

Pathways with genes showing consistent up-regulated transcription levels in response to HMF challenge are highlighted in blue, those with genes harboring non-synonymous SNPs are highlighted in green, while those genes that show consistent up-regulated transcription levels in response to HMF challenge and also harbor non-synonymous SNPs are highlighted in red.

Many of these genes were found to be associated with multiple pathways such as carbohydrate, amino acid and nucleotide metabolism (Fig. 2; Table S4). Tolerant responses of industrial yeast to pretreatment inhibitors have been frequently reported in association with pathways of glycolysis, butanate metabolism, alanine, aspartate and glutamate metabolism, valine, leucine and isoleucine biosynthesis and the metabolisms of arginine, proline, histidine, tyrosine, tryptophan and glutathione12,26. Findings in this study were consistent with those previously reported. In addition, we found a large amount of single nucleotide variations to be associated with nucleotide metabolism, particularly in purine metabolism and pyrimidine metabolism pathways, each had 35 and 23 SNPs, respectively. Purines and pyrimidines are two of the building blocks of nucleic acids. The synthesis pathways of these nucleotides involve generation of many kinds of energy molecules such as commonly used ATP in most reactions, GTP in protein synthesis, UTP for activating glucose and galactose, CTP in lipid metabolism and AMP as a structural component of NAD and coenzyme A. These pathways are also closely related to pentose phosphate pathways and many amino acid metabolism pathways and worth further investigation for yeast tolerance studies. Furthermore, our study revealed two tolerant signal transduction pathways, the mitogen-activated protein kinases (MAPK) signaling pathway and the phosphatidylinositol signaling system, that have a great impact on yeast performance against HMF and possible other biomass pretreatment inhibitors.

Tolerant MAPK signaling pathways

Numerous metabolic pathways have been reported affecting yeast tolerance by high throughput assays including transcriptome and proteome analysis5,27,28. However, roles of MAPK pathway, especially in tolerance to toxic compounds liberated from biomass pretreatment, remain largely unknown. MAPK signaling pathway regulates a variety of cellular activities including cell proliferation, differentiation, survival and death. It contains sensor receptors, transduction proteins and three protein kinases acting in a series of a MAP kinase kinase kinase (MAPKKK), a MAP kinase kinase (MAPKK), followed by a MAP kinase (MAPK)29. In S. cerevisiae, five functionally distinct MAPK cascades were characterized and each MAPK cascade performs important functions in the regulation of diverse gene expression by phosphorylation on transcriptional factors and other target genes30. In Y-12632, non-synonymous sequence variation and differential expression of genes were found to be involved in at least three MAPK pathways: filamentous growth pathway (FG), high osmolarity glycerol pathway (HOG) and cell wall integrity pathway (CWI).

In the FG, HOG and CWI pathways, we found more than 160 non-synonymous SNPs contained in at least 41 genes, with up to 19 SNPs within a gene. Our quantitative gene expression analysis singled out 25 genes in these three MAPK pathways that displayed consistently higher levels of signature expressions against the HMF challenge compared with the untreated control (Fig. 3B). Among which, 21 genes were overlapping with those non-synonymous SNP containing genes (Fig. 3A). For example, transmembrane sensor genes SLN1, WSC2 and WSC3 each had nine, five and three non-synonymous mutations, respectively (Table S5), which were all confirmed using independent PCR sequencing verifications. These up-regulated sensor genes in HOG and CWI pathways appeared to be interacted effectively with downstream genes such as SSK1, FKS1 and PKC1 to activate their respective MAPK cascades as direct signal transduction channels in response to HMF challenge. This includes the enhanced signature expressions of BCK1, MKK1,MLP1, MLP2 and PLC1 specifically in the CWI pathway and SSK2,SSK22 and STE11 in the HOG pathway, respectively (Fig. 3A, B). Sequence variations have been observed to affect gene performance, for example, non-synonymous SNPs of MSN4 altered gene expression31,32. Mutation of amino acids in the extracellular domain of SLN1 was reported to significantly affect downstream proteins such as Hog1p, causing protein dephosphorylation33. Since all of the nine non-synonymous SNPs for SLN1 observed in this study were located in the extracellular domain, the altered amino acids are expected to impact the downstream interplay in the HOG pathway. Moreover, SLN1 exhibited a steadily enhanced signature expression in response to HMF. The large amount of sequence variations as measured by SNPs for those up-regulated genes was significant statistically. In the FG pathway, a normally expressed membrane sensor path activated MARK cascades showing up-regulated signature expressions of STE11, STE7 and KSS1. All of these genes involved in this pathway showed sequence mutations. It is important to point out that normally expressed genes under the inhibitory challenge conditions, such as SHO1, CDC42 and STE20, are important and necessary for functional cells to keep the interaction flow globally. We observed that this kind of genes in a non-tolerant wild-type strain lost functions 48 h after inhibitor challenge11. Thus, persistent expression under the stress should be considered as a tolerant characteristic of the gene. Most of these normally expressed genes were also found to contain non-synonymous SNPs (Fig. 3A). Whether these sequence variations are involved in the fine-tuned expression response is currently unknown.

Figure 3
figure 3

Genomic and phenotype analysis of selective MAPK-signaling pathways against HMF in S. cerevisiae Y-12632.

(A) Illustrative networks showing normally expressed (in gray) and enhanced expressed genes (in red). Genes with functional SNPs are indicated with a star (*). (B) Gene expression over time during the lag phase for selective genes in response to HMF.

We found that seven SNPs of MLP2 in the CWI pathway MAPK cassette for Y-12632 were particular interesting. These mutations were located at the upstream regions at -5, -46, -119, -317, -394, -1557 and -1722 positions, respectively. UTR sequences, especially 5′ UTRs, play a concerted role in regulation of gene expression under stress conditions34. Among the 5′UTR SNPs observed, the mutation at the -5 locus resulted in a transcription factor binding motif of CGGNS for Stb5p. Transcription factor Stb5p is an important regulator mediating multidrug resistance and oxidative stress response, which is a key element of yeast tolerance to biomass pretreatment inhibitors. In addition, SNPs at -119 and -317 loci resulted in two duplicated Azf1p binding motifs of AAAAGAAA. Thus Y-12632 has a total of four repetitive binding motifs of Azf1p in the upstream of the MLP2 coding region, which is twice the dose of S288C. Azf1p is a zinc-finger transcription factor, which activates genes involved in maintenance of cell wall integrity35,36. The doubled amount of motifs is expected to enhance protein binding activity and enable stronger expressions of the tolerance gene for a tolerant response against the toxic chemicals. We also observed that PKC1 and PLC1, key-connecting nodes of MAPK-CWI pathway, harbored multiple SNPs in their pre-coding regions. However, these mutations did not generate any transcription factor binding motifs and their possible functional impact is not clear. Nonetheless, the significantly activated CWI pathway responses observed in this study supports CWI as an important component underpinning Y-12632 performance in response to HMF challenge. The enriched sequence variation for many genes in this pathway and their enhanced signature expressions under HMF exposure suggest a potential role of this pathway in yeast adaptation to HMF stress.

To confirm functions involved in HMF tolerance of these candidate genes, we examined growth response of analogous single gene deletion mutations for the following selective genes: SHO1, STE7, KSS1, DIG1, DIG2, TEC1 from FGP pathway, SSK1,STE11, SSK22, SSK2, MSN2 and MSN4 from HOG pathway and WSC2, FKS1, BCK1, MLP1, MLP2, PLC1, SWI4, SWI6, RLM1 and GSC2/FKS2 from CWI pathway. On a medium without HMF, a wild type strain BY4742 and all single gene deletion mutations showed normal growth and reached a station phase approximately at 24 h after incubation (Fig. S3). When 20 mM HMF was added in the medium, the control strain was still able to grow but at a slower rate, reaching a station phase at 32 h. In contrast, all mutations were severely repressed on the HMF containing medium and no growth response was observed until 48 h, except for ΔWSC3 showing a recovery of growth after a 24 h lag phase. Most single gene deletion mutations showed no growth response on HMF-containing medium even until 72 h after incubation. These results indicated that each of these tested candidate gene was essential for the yeast survival and growth in response to HMF challenge.

For many genes involved in varied downstream interactions to MAPK pathways, we also observed high levels of sequence variations accompanied with greater gene expressions in response to HMF challenge (Fig. 3A). These genes are transcription factor genes, including RLM1, SWI6, SWI4, FKS2, MSN2, MSN4, DIG1, DIG2, STE12 and TEC1, that mediate a wide range of biological functional categories relevant to yeast tolerance such as stress response, cell cycle and cell wall integrity. In general, expression levels of these genes increased over time with exposure of HMF (Fig. 3B). The HOG and CWI pathways have been reported to play important roles in adapting to hyperosmotic stress and maintaining cell wall integrity under stressful conditions37. High levels of protein expression in HOG pathway were reported in response to pretreatment inhibitors38. HOG1 pathway was also reported to be involved in tolerance of furfural and acetic acid39,40. Our observations in this study concur with those previously reported. As mentioned above, the CWI pathway was especially noteworthy in response to HMF challenge and can be a key component for the yeast tolerance to HMF and other pretreatment inhibitors. Its important role for tolerance will be further discussed in the following section in close relationship with phosphatidylinositol signaling pathway.

The unique phosphatidylinositol signaling pathways

Potential functional alterations derived from DNA sequence variations also appeared to exist in the phosphatidylinositol signaling pathways. Phosphoinositides (PIs), derived from phosphatidylinositol by phosphorylation, are regulatory lipids that function in signal transduction and mediate numerous physiological processes in eukaryotic organisms, such as growth, cytoskeletal rearrangement and membrane trafficking41. Thirteen genes in the PI signaling system of Y-12632 were found to harbor numerous non-synonymous SNPs (Fig. 4A). Among which, nine genes, including VPS34, FAB1, INP52, INP54, PIK1, PLC1, PIS1, INM2, and PKC1, showed consistently enhanced signature expressions in response to HMF challenge (Fig. 4B). For example, VPS34 (encoding PI(3) kinase which is crucial for phosphatidylinositol-3-phosphate [PIP(3)] synthesis) and its downstream membrane kinase FAB1 each displayed ten and nine non-synonymous mutations, respectively (Table S5). Both genes showed enhanced signature expression facilitating pathways of PI(3,5)P2 synthesis. Accumulation of PI(3,5)P2 has been observed in association with osmotic stress in yeast42. The active expression for PI(3,5)P2 biosynthesis indicates a tolerant response to the imposed inhibitor stress. In a parallel path of PIP(4) synthesis, a kinase encoded by PIK1 catalyzed reaction was also highly activated. PIK1 has been recognized as an essential gene for synthesis of PIP(4) used in the secretary machinery of yeast cell43. All these three genes in Y-12632 can be distinguished from S288C in sequence variations with non-synonymous mutations in their exonic regions. INP52 and INP54 were also observed to be up-regulated in favor of biosynthesis of IP1/IP3. These genes encode inositol polyphosphate 5-phosphatase, which pose dual functions of 5-phosphatase and polyphosphoinositide phosphatase activity that are essential in mediating metabolism of PIs. Similarly with confirmations of gene functions for MARK pathways, we evaluated selective genes for their growth response to HMF using corresponding single gene deletion mutations. The highly sensitive growth response of the mutations against HMF confirmed the tolerant functions of these candidate genes (Fig. S4).

Figure 4
figure 4

Genomic and phenotype analysis of phosphatidylinositol signaling system against HMF in S. cerevisiae Y-12632.

(A) Illustrative networks showing normally expressed (in gray) and enhanced expressed genes (in red). Genes with functional SNPs are indicated with a star (*). (B) Gene expression over time during the lag phase for selective genes in response to HMF.

Using independent sequencing analysis by PCR amplification, we confirmed six non-synonymous SNPs of INP54 and detected five SNPs were located in its polyphosphate 5-phosphatase domain. One mutation resulted in an amino-acid alteration from Lys264 to Met264, which was predicted as a non-neutral substitution by SNAP44. Recently, we demonstrated that amino acid substitution from Val285 to Asp285 of Gre2p significantly increased aldehyde reduction activity using additional cofactor NADPH in yeast45. Thus, DNA sequence variation in industrial yeast can lead to amino-acid substitutions that potentially affect gene functions and gene interactions. Interestingly, these mutated genes in Y-12632 exhibited highly activated signature expressions under the HMF stress.

It is important to point out that the PI signaling system is closely related to MAPK pathways, especially to the CWI pathway, as bridged through interactions mediated by PLC1 and PKC1. It is striking that both genes contained substantial amino acid substitutions. Amino-acid changes caused by SNP in HOG1 and PBS2 have been observed to cause abnormal cross-talk between MAPK-HOG pathways in activation of osmolarity pheromone response46. Highly activated expressions of PLC1 and PKC1, key genes linking PI signaling system and CWI pathway, are expected to reflect the impact of sequence mutations and associated amino-acid substitutions.

Conclusions

Research and development of yeast tolerance to pretreatment inhibitors is a rapid growing field for advanced biofuels applications. Many candidate genes and regulatory elements have been identified in yeast response to inhibitory compounds such as furfural and HMF5,18,19,47. However, little is known about the signaling transduction pathways regarding yeast tolerance to biomass pretreatment inhibitors. This study provides the first insight into the unique genomic background of an industrial yeast type strain regarding to chemical stress tolerance in comparison to a model reference genome. Two major technical challenges exist for developing next-generation biocatalyst for advanced biofuels production: improved stress tolerance including inhibitor tolerance and efficient and balanced utilization of C-5 and C-6 biomass sugars using lignocellulosic hydrolytes. Yeast strain improvement has been widely studied using lab strain and industrial yeast strains. Inconsistent phenotypic results from model lab strain and industrial yeast strains exist and it poses challenges to effectively addressing these technical difficulties based on their different genomic background. Utilizing a model lab strain has a great advantage of convenient genetic tools. However, a careful interpretation of results derived from study of lab strains is needed since it may not necessarily reflect an industrial yeast response or performance. A recently reported significant improvement of heterologous xylose transporter function in an industrial strain of S. cerevisiae suggested industrial yeast as a more suitable host for xylose utilization improvement9. In this study, the findings of the tolerant MAPK and PI signaling pathways in Y-12632 against biomass pretreatment inhibitors provide a strong basis for interpretation and rationale for utilizing industrial yeast for tolerant strain development. Thus an increased use of industrial yeasts as both workhorse and research model cannot be overemphasized in the development of the next-generation biocatalysts for advanced biofuel applications.

Methods

Yeast strains and culture conditions

S. cerevisiae strain NRRL Y-12632 (Agricultural Research Service Culture Collection, Peoria, IL, USA) was maintained and cultured on a synthetic complete medium. Nonessential haploid S. cerevisiae deletion mutations generated by the Saccharomyces Genome Deletion Project and the parental strain BY4742 (MATα his3Δ1 leu2Δ0 lys2Δ0 ura3Δ0) were obtained from Open Biosystems (Huntsville, AL). BY4742 is a designed deletion strain directly derived from Saccharomyces cerevisiae S288C, which is widely selected as parent strains for the international systematic Saccharomyces cerevisiae gene disruption project48.

Freshly grown cells harvested at logarithmic growth phase were used as inoculate after 16 h incubation with agitation of 250 rpm at 30°C. Cells were incubated on SC medium using a flask fermentation system at 30°C as previously described15.

Genome sequencing and analysis

Genomic DNA of Y-12632 was isolated and sequenced by Solexa GA-IIx with a pair-ended strategy. Paired reads with an average length of 75 bp and pair distance of 300 bp were collected and assembled using Velvet (v1.1.07). Contigs shorter than 500 bp were filtered and removed from the final assembly. Gene structure was predicted using Augustus (v2.5.5) with the gene models of S. cerevisiae S288C as a reference. Orthology and sequence identity of each predicted protein was evaluated by searching against Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/), NCBI-NR (http://www.ncbi.nlm.nih.gov/) and KEGG database (http://www.kegg.jp/) using BlastP (E-value < 1e-5) and the best-hit was used for the final annotation. The functional domains and GO annotations were obtained by InterProScan.

To identify unique genes in Y-12632, the predicted ORFs in Y-12632 genome were searched against all proteins of S288C by BlastP using an E-value cutoff of 1e-5 and identity (length of alignment/length of query) cutoff of 80%. To avoid omitting alignments by gene prediction error, the unaligned proteins from Y-12632 were searched against genome of S288C by tBlastN with E-value cutoff at 1e-5 and identity cutoff of 80%. Proteins in Y-12632 that failed to align to S288C genome by either BlastP or tBlastN were considered unique genes in Y-12632.

Phylogenetic analysis

The orthologs among the selected yeast strains were identified by a Markov Clustering algorithm (OrthoMCL v.4)49 with inflation index of 1.5. The protein-coding gene sets for each of the genomes were searched against all genes in the genomes (including their own genome) by blast with E-value cutoff value 1e-5. Then the orthologous groups were generated by MCL with inflation index of 1.5, in which each of the genes is an ortholog to all other members of the same group. For each of the orthologous gene-sets, the protein sequences encoded were aligned by MUSCLE (v3.7)50. The alignments were curated by GBlock (v0.91b)51 to filter out poorly aligned positions. The curated alignments were then analyzed by PhyML (v3.0)52 to generate the phylogenetic trees based on maximum likelihood. A consensus tree was then constructed based on all of the single copy orthologous gene-sets.

SNP identification and analysis

Sequence reads were mapped to the S. cerevisiae S288C genome using BWA53. The variants calling was performed using SAMtools v0.1.1654 and GATK v1.055 with default parameters, respectively. Then a custom python script was developed to produce an intersection of the SNPs predicted by SAMtools and GATK. The results were then subject to additional filtering to remove SNPs with low quality value and low read coverage. Functional classification and annotation of the predicted SNPs were performed using NGS-SNP tool56. Databases Ensembl (release 63), EntrezGene and Gene Ontology were used as reference for annotation. Several fields of other information, such as descriptions of the influenced transcripts and proteins were also provided where applicable. Potential effects of single amino acid substitutions on protein function were predicted by SNAP44.

To confirm Y-12632 mutant protein sequences, queries were aligned to structural templates in Protein Data Bank (www.rcsb.org/pdb/) using BlastP. Transcription factor binding sites in promoter sequences were identified by searching Yeastract (www.yeastract.com) and the Promoter Database of Saccharomyces cerevisiae (http://rulai.cshl.edu/SCPD).

Functional enrichment analysis

All genes with SNPs were classified by their annotations of GO terms. Functional enrichment analysis was performed using WebGestalt57. Briefly, we used WebGestalt to implement the hypergeometric test for enrichment of GO terms and metabolic pathways for all candidate genes. The entire yeast S288C genome was used as a reference list. A statistic cutoff P-value for significantly enriched terms was set up at P < 0.05. The P value was calculated using Fisher's exact test and then adjusted using the R function p.adjust by WebGestalt.

Metabolic pathway analysis

KEGG IDs associated with each functional SNP associated gene in Saccharomycescerevisiae were obtained, when applicable, by searching the protein sequence against KEGG database with E-value cutoff at 1e-5. Best hits were recorded, which allowed the mapping of genes to KEGG pathways. The pathways related to the identified genes were then displayed using the iPath58.

Genome-wide gene expression profiling using microarray

Genome microarray was fabricated with a version of 70-mer oligo set representing 6388 genes using OmniGrid 300 Gene Machine with embedded universal RNA reference and restricted quality control measurements28,59. Cultures of S. cerevisiae strain NRRL Y-12632 were treated with HMF at a final concentration of 30 mM 6 h after the inoculation. A time course study was carried on and samples taken at 0, 10, 30, 60, 120 min after the HMF treatment. Cultures grown under the same conditions without the HMF treatment served as a control. Two replicated experiments were carried out for each condition. Cells harvesting, RNA isolation, labeling and hybridization were carried out as previously described5,59,60. Microarray data were normalized using the quality control gene CAB and deposited in the Gene Expression Omnibus database under accession number GSE22939. Differentially expressed genes were identified using Significance Analysis of Microarrays (SAM)25 with the time-course mode to identify genes with a consistent increase over time. Significant gene changes were arbitrarily selected at SAM (d)score values equal or greater than 0.23 with the lowest false discovery rate (FDR < 0.1). All identified regulated genes were mapped to the S. cerevisiae biological pathways in KEGG database and annotated with GO database.

Growth response of single-gene-deletion mutants to HMF

Twenty-eight selective single gene deletion mutations from Saccharomyces Genome Deletion Sets were used for examination of growth response to HMF. Since currently there is no deletion mutant library available for Y-12632 strain, gene knockout mutants of a parental wild type strain BY4742 were used as an analogue to verify the associated functions of candidate genes.

The selected genes are mainly involved in MAPK signaling pathway, phosphatidylinositol signaling system and immediate interplays, including available non-essential genes and transcription factor genes WSC2, INP54, MSN4, SWI6, MLP1, FKS1, DIG1, RLM1, INM2, WSC3, BCK1, STE7, SSK2, INP52, MLP2, KSS1, TEC1, FKS2, SSK22, FAB1, MSN2, DIG2, SHO1, SWI4, STE11, PLC1, and VPS34. Each tested strain was grown on a 4 ml synthetic medium in a 15-ml tube at 30°C with agitation of 250 rpm [10]. The initial OD at 600 nm of the inoculated medium for each deletion strain culture was adjusted to the same level and inoculated onto the medium with or without a final HMF concentration of 20 mM. Cell growth was monitored by absorbance at OD600 (n = 3).