Introduction

Antisense transcription (transcripts from the opposite strand of a sense gene) is widespread in eukaryotes, from yeast to mammals 1. Studies in various organisms revealed that antisense transcripts are involved in degradation of the corresponding sense transcripts (RNA interference) 2. However, in Saccharomyces cerevisiae, components of the RNAi machinery are absent 3, and antisense repression can be mediated by transcription interference (TI) or histone deacetylation. TI was thought to be an unavoidable suppressive consequence of two convergent promoters directing transcripts that overlap for at least part of their sequences 4. But in the case of PHO84, the sense gene is regulated by accumulation of antisense RNAs, which leads to targeted histone deacetylation and the silencing of sense transcription 5. However, all the reported mechanisms rely on the non-coding antisense RNAs. Whether the protein-coding antisense gene can serve a regulatory role remains an open question. In this study, we identified a pair of functionally linked protein-coding sense and antisense genes, YCL058C (MDF1) (previously named as a dubious gene FYV5, whose function was thought to be required for yeast viability 5 6) and YCL058W-A (ADF1) in S. cerevisiae. Through extensive genetic, cytological, and biochemical experiments, we demonstrate that the regulation that YCL058W-A confers to YCL058C is not due to previously known mechanisms, but results from binding of YCL058W-A protein as a transcription repressor to the promoter region of YCL058C. Thus, our results reveal a new molecular mechanism of interaction between the sense and antisense pair.

In addition to the regulatory connection between this gene pair, the unusual property of the sense gene YCL058C itself also caught our attention. In our sequence comparative analysis, we did not find any significantly homologous open reading frame (ORF) of YCL058C in any other yeast species. Therefore, YCL058C probably originated de novo from a previously non-coding sequence in S. cerevisiae.

The origination of new genes, a fundamental process for all organisms, has been extensively studied in the past few years 7. The majority of newly evolved genes are derived from pre-existing genes, and their origination mechanisms include duplication divergence, retrotransposition, exon shuffling, and lateral gene transfer 7. Completely de novo origination of a protein-coding gene from a non-coding sequence has been thought to be an almost impossible event, as stated by Susumo Ohno that ā€œEach new gene must have arisen from an already existing geneā€ 8, and by FranƧois Jacob that ā€œThe probability that a functional protein would appear de novo by random association of amino acids is practically zeroā€ 9. However, a number of de novo genes have recently been identified mainly by Begun's and our group 10, 11. These putative de novo genes have already generated intensive debate and discussion (e.g. Casci 12; http://richarddawkins.net/forum/viewtopic.php?f=4&t=45460). These controversial examples are not supported by direct evidence of their protein-coding capacity, but only by the existence of putative ORFs and expression sequences. The direct evidence of their protein-coding capacity still remains to be provided. Moreover, a concrete molecular mechanism or pathway has not been demonstrated for any young duplicated genes, let alone de novo genes. The discovery of a concrete molecular mechanism/pathway for a newly evolved gene would convincingly show the biological significance of origin of new genes and significantly contribute to our mechanistic understanding of functional evolution in general.

Here we performed comprehensive evolutionary and experimental analyses on YCL058C and showed that this new gene is not only capable of encoding a protein but also takes essential cellular tasks in the mating pathway of S. cerevisiae. By binding the MATĪ±2 protein, one of the determinants of yeast mating types, YCL058C suppresses yeast mating behavior and allows quick vegetative growth. As the previous name for YCL058C, FYV5 6, was not functionally distinguished from that of its antisense gene, YCL058W-A, which nests on the antisense strand of YCL058C, we propose to name YCL058C as MDF1 (Mating Depressing Factor 1) and its anti-sense partner YCL058W-A as ADF1 (Antisense of Depressing Factor 1) to reflect the newly uncovered properties of YCL058C and the functional relationship between this gene pair.

Results

Both MDF1 and ADF1 are subject to selection and encode proteins

MDF1 with an ORF of 152 amino acids is located in chromosome III of S. cerevisiae, while ADF1 with an ORF of 113 amino acids completely nests on the opposite strand of MDF1. To initially test if MDF1 and ADF1 are functional protein-coding genes in S. cerevisiae, we conducted an evolutionary analysis to look at whether they have been subject to functional constraint by estimating their nucleotide substitutions within and between yeast species 13. For MDF1, we conducted an intraspecies analysis and found that MDF1 is fixed in all 39 sequenced S. cerevisiae strains from geographically and ecologically diverse sources, and there are no frame-shift or nonsense polymorphisms, suggesting that the gene may be under functional constraint. The seven polymorphic sites in these 39 S. cerevisiae strains are all non-synonymous substitutions. It is significantly different from neutral expectation (P = 0.038) by Z test 13, implying positive selection on MDF1 and thus suggesting the functionality of MDF1. For ADF1, the evolutionary rates of non-synonymous and synonymous substitution among species in the sensu stricto group are significantly smaller than 1 (Supplementary information, Table S1), suggesting strong functional constraints on ADF1.

We next examined the functionality of MDF1 and ADF1 by testing whether transcription and further translation in S. cerevisiae are possible. Strand-specific RT-PCR showed that both MDF1 and ADF1 expressed in normal condition in S. cerevisiae, but not in other yeast species (Figure 1A). In an effort to obtain the final proof of the protein-coding capabilities of MDF1 and ADF1, 3HA and 13Myc-tags were annealed to the 3ā€²-ends of MDF1 and ADF1, respectively. The western-blot analyses detected positive signals (Figure 1B), which states clearly that MDF1 and ADF1 can encode proteins.

Figure 1
figure 1

Both MDF1 and ADF1 are protein-coding genes. (A) The strand-specific RT-PCR experiments showed that MDF1 only expressed in S. cerevisiae in the YPD medium, while ADF1 expressed constantly in the sensu stricto group species. ACT1 was used as the internal control. S.cer, S. cerevisiae; S.par, S. paradoxus; S.mik, S. mikatae; S.bay, S. bayanus. (B) Endogenous Mdf1p and Adf1p tagged with 3HA and 13Myc respectively were detected by western blotting, untagged yeast was used as the negative control, and tubulin as a positive control.

MDF1 and ADF1 have antagonistic effects on growth in rich medium

Previous preliminary phenotypic screening analyses indicated that the MDF1Ī” mutant appeared to show reduced growth in rich medium 14. This encouraging hint suggests that MDF1 or ADF1 may influence growth. To discriminate the functional effects of MDF1 and ADF1, we cloned MDF1 (M for short) and ADF1 (A for short) separately into the whole locus deletion (Māˆ’Aāˆ’) strain in the background of Ī± cells of S. cerevisiae using pRS316 vector. For the relatively short ADF1, the coding sequence plus upstream flanking sequence of ADF1 can simply be used to construct the Māˆ’A+ strain. For the relatively long MDF1, a stop codon was introduced into the 5ā€²-end of ADF1 without changing the coding ability of MDF1 by site-directed mutagenesis to construct the M+Aāˆ’ strain. After genetically separating MDF1 and ADF1, we measured the influence of MDF1 and ADF1 on proliferation by both competition experiments and growth rate analyses at 30 Ā°C in the rich medium. The competition experiments showed that the M+Aāˆ’ strain grew more quickly than the wild-type strain, whereas growth defects were observed in both Māˆ’A+ and Māˆ’Aāˆ’ strains (Figure 2A). In agreement with this finding, the M+Aāˆ’ strain enjoyed faster growth in growth rate analyses, but Māˆ’A+ and Māˆ’Aāˆ’ strains proliferated more slowly than the reference wild-type strain (Supplementary information, Figure S1A). The growth defects of Māˆ’Aāˆ’ strain could be remedied by re-introducing both MDF1 and ADF1 (Figure 2A and Supplementary information, Figure S1A). In addition, the growth superiority of M+Aāˆ’ strain was repeatedly supported by our two-dimensional gel electrophoresis data, which showed that some essential genes involved in the energy and substance metabolism, such as ATP1, PGK1, MDH1, SAM1, were distinctly increased in the M+Aāˆ’ strain compared with the wild type (Supplementary information, Figure S1B). The antagonistic effects of MDF1 and ADF1 on growth raise the possibility of sense-antisense interaction. Therefore, we seek further evidence for this interaction phenotypically and mechanistically in our next experiments.

Figure 2
figure 2

Adf1p negatively regulates the expression of MDF1 by binding the promoter region of MDF1 (A) Competition experiments indicate that MDF1 and ADF1 have antagonistic effects on yeast growth, i.e.,M+Aāˆ’ (MDF1+ADF1āˆ’) strain grew much faster than the wild type strain (**P < 0.01), whereas Māˆ’A+ and Māˆ’Aāˆ’ strains grew worse than the wild type strain (*P < 0.05). Histograms represent the clone numbers of mutants divided by the clone numbers of wild type. The values are average of three independent experiments (with standard deviations). WT, wild type strain; Māˆ’Aāˆ’, strain with both MDF1 and ADF1 deleted; Māˆ’A+, strain with MDF1 deleted and ADF1 left; M+Aāˆ’, strain with ADF1 deleted and MDF1 left; Māˆ’Aāˆ’ + M + A, strain with MDF1 and ADF1 simultaneously transformed back to Māˆ’Aāˆ’ strain. (B) Overexpressed Adf1p inhibits the expression of MDF1 completely. WT, wild type; WT + ADF1, ADF1 was overexpressed in the background of wild type; ACT1, house-keeping gene as internal control. (C) Nuclear localization of Adf1p is visualized by Adf1p -GFP fusion protein. (D) ChIP assays shows that Adf1p binds the promoter region of MDF1. The final DNA extracts were amplified using a pair of primers that cover the promoter region (between āˆ’150 to +29 bp) of MDF1. IN, input; IP, immunoprecipitation; āˆ’Ab, control for non-specific binding in the absence of antibody; untag, untagged yeast as a negative control. (E) The model of a new sense-antisense interaction mechanism, in which the antisense-encoded protein (Adf1p) negatively regulates the expression of the sense gene (MDF1) by binding the promoter of the sense gene. The promoter region (between āˆ’150 to +29 bp) of MDF1 used for ChIP assays was indicated.

Adf1p negatively regulates the expression of MDF1 by binding to its promoter

To examine if ADF1 has an effect on MDF1 expression, we overexpressed ADF1 using the inducible pYES3/CT vector in the wild-type S. cerevisiae. Strikingly, the sense (i.e. MDF1) transcripts could be completely abolished by overexpressed ADF1 (Figure 2B). Because the overexpressed ADF1 on the plasmid does not physically overlap with the chromosomal MDF1, transcription interference 4 is not a probable cause. This transcriptional suppression is instead probably due to the RNA or protein of ADF1 present in the cells. In view of the absence of RNAi machinery in S. cerevisiae 3, it is more likely that the repression occurred at the protein level. Subcellular localization of the Adf1p provides further support for a role as a transcription factor. By constructing a GFP-fusion plasmid to localize Adf1p within yeast cells, we observed that Adf1p resided in the nucleus (Figure 2C), representing a major characteristic of a transcription factor.

As it was of interest for us to explore whether the Adf1p could actually regulate the transcriptional activity of MDF1 as a transcription repressor, we subsequently performed chromatin immunoprecipitation (ChIP) assay to investigate the direct association of Adf1p with the MDF1 promoter in a yeast strain overexpressing His-tagged Adf1p. The ChIP results show that Adf1p does bind to the upstream region of MDF1 (Figure 2D). Taken together, these results strongly support a novel mechanism of sense-antisense interaction, in which the antisense-encoded protein negatively regulates the expression of the sense gene by binding to the promoter of the sense gene (Figure 2E).

Mdf1p significantly decreases the mating efficiency of Ī± cells

In an attempt to uncover the underlying mechanism for the rapid growth in M+Aāˆ’ strain, we conducted global microarray analyses among strains M+Aāˆ’, Māˆ’A+ and wild type. Unexpectedly, our microarray data indicated that most of the down-regulated genes in M+Aāˆ’ strain are enriched in the yeast mating pathway in comparison with the wild-type and Māˆ’A+ strains (Figure 3A and Supplementary information, Table S2). Our quantitative mating assays further confirmed that the M+Aāˆ’ strain was substantially less successful than the wild-type Ī± strain (P < 0.01) in mating, whereas the mating efficiencies of Māˆ’A+ and Māˆ’Aāˆ’ strains were comparable to those of wild type (Figure 3B). Therefore, it is intuitively appealing to assume that MDF1 fulfills a role in the mating pathway.

Figure 3
figure 3

Mdf1p significantly decreases the mating efficiency of Ī± cells. (A) Most of the down-regulated genes in M+Aāˆ’ strain via array analyses were associated with yeast mating pathway. WT, wild type strain; Māˆ’A+, strain with MDF1 deleted and ADF1 left; M+Aāˆ’, strain with ADF1 deleted and MDF1 left. (B) The mating efficiency tests demonstrate that the M+Aāˆ’ strain mated far worse than the wild type Ī± strain (**P < 0.01), while the mating efficiencies of Māˆ’A+ and Māˆ’Aāˆ’ strains were comparable to the wild type Ī± cells. The mating defect of M+Aāˆ’ strain cannot be rescued by deleting HMRa in M+Aāˆ’ strain (M+Aāˆ’-HMRa) (**P < 0.01). (C) Key components of MAPK pathway were significantly down-regulated. More than three times down-regulated genes were marked with green. The MAPK pathway information for S. cerevisiae was downloaded from Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.genome.jp/kegg/). (D) The significantly down-regulated genes involved in the MAPK pathway were confirmed by semi-quantitative RT-PCR.

The mating pathway (mitogen-activated protein kinase (MAPK) pathway) is currently one of the best-characterized pathways in yeast 15. Three distinct cell types exist in S. cerevisiae: haploid cell types a and Ī±, and diploid cell type a/Ī±. The MAT loci encode master regulators of cell type: MATal is encoded by the MATa locus, present in a cells and diploids, while MATĪ±l and MATĪ±2 are encoded by the MATĪ± locus, present in Ī± cells and diploids. In a cells, a-specific and haploid-specific genes function by default, and so MATa1 does not contribute anything. In Ī± cells, the MATĪ±1 protein turns on Ī±-specific genes, including STE3, the entrance of the MAPK pathway; the MATĪ±2 protein turns off a-specific genes, while haploid-specific genes function normally. In response to mutual pheromone stimulation, the mating pathway is triggered, and thus a and Ī± cells can fuse to form diploids. In diploid cells, MATĪ±2 protein still turns off a-specific genes, while MATa1 and MATĪ±2 dimerize to suppress haploid-specific genes, including MATĪ±1. The diploid cells can undergo meiosis and transform into a or Ī± haploids in the scarcity of fermentative carbon and nitrogen sources.

When scrutinizing our array data, we found that MATĪ±1 was almost completely suppressed and all the known haploid-specific genes, some of which are key components of the MAPK signaling pathway (Figure 3C), were among those significantly down-regulated in the M+Aāˆ’ strain. The a-specific genes were off, consistent with the observation that the transcription level of MATĪ±2 remained relatively normal. Meanwhile, the Ī±-specific genes were off as well, which was an unavoidable consequence of suppressed MATĪ±1 gene expression. The down-regulation of these genes revealed by microarray results was validated by semi-quantitative RT-PCR (Figure 3D). Therefore, the M+Aāˆ’ cells, which are physically Ī± haploids, behave more like diploids, in which the a-, Ī±- and haploid-specific genes are all shut down or down-regulated.

In addition to the MAT locus, S. cerevisiae carries two unexpressed, but complete copies of mating-type genes HMLĪ± and HMRa, which are usually transcriptionally silenced 16. One explanation for the above unusual expression pattern is the abnormal activation of the cryptic mating-type loci HMLĪ± and HMRa, which can lead to the coexpression of a and Ī± information in haploid cells 17. However, our array data showed that HMLĪ± and HMRa remained silenced in the M+Aāˆ’ strain. The unchanged mating inhibition phenotype observed when HMRa was deleted in the M+Aāˆ’ strain (Figure 3B) further ruled out the possibility that Mdf1p activates the silent mating cassette HMRa and allows a1/Ī±2 suppressor to be formed. Therefore, the simplest mechanistic explanation for this pseudo-diploid phenotype is that Mdf1p in Ī± cells may bind MATĪ±2 protein, similar to what MATa1 does in diploid cells. The first piece of evidence of Mdf1p mimicking MATa1 came from the predicted secondary structure of Mdf1p by the online protein structure prediction server, PORTER (http://distill.ucd.ie/porter/) 18. Similar to MATa1protein, Mdf1p looks like a three-helix-bearing protein, which is the foundation for binding MATĪ±2 and the targeted DNA 19.

Mdf1p regulates the mating pathway of S. cerevisiae by binding MATĪ±2

If the Mdf1p-MATĪ±2 interaction hypothesis is right, we would expect that Mdf1p functions upstream of MATĪ±1 which is targeted by a1/Ī±2 heterodimer in diploid cells and Mdf1p should function differently in a and Ī± cells because of the absence of MATĪ±2 in a cells. To test the first deduction, we overexpressed MATĪ±1 in the M+Aāˆ’ strain in Ī± cells. As we anticipated, the M+Aāˆ’ strain recovered much of the mating ability (Figure 4A). To test the second deduction, we further deleted ADF1 alone in a cells, and found that the mating efficiency of the M+Aāˆ’ (a) strain was not as affected as the M+Aāˆ’ (Ī±) strain (Figure 4A). Furthermore, to better understand the mechanistic aspects of Mdf1p, we examined the subcellular localization of Mdf1p by adding GFP to the C-terminus of Mdf1p. The fluorescence illustrated that Mdf1p exists in both the cytoplasm and nucleus (Figure 4B), which does not conflict with our Mdf1p-MATĪ±2 interaction hypothesis. Hence, on the whole, the above evidence matches the proposed role for Mdf1p as a transcription suppressor for the mating pathway.

Figure 4
figure 4

Mdf1p regulates the mating pathway of S. cerevisiae by binding MATĪ±2 protein. (A) Mdf1p functions upstream of MATĪ±1. Overexpression of Ī±1 protein can extricate M+Aāˆ’ strain from mating defect to some extent in Ī± cells, and M+Aāˆ’ strain exhibits relatively normal mating efficiency in the background of a cells lacking MATĪ±1. WT (Ī±), wild type (Ī± cells); M+Aāˆ’ (Ī±), ADF1 was deleted in Ī± cells; M+Aāˆ’ (Ī±) + Ī±1, Ī±1 was overexpressed in M+Aāˆ’ (Ī±); M+Aāˆ’ (a), ADF1 was deleted in a cells. (B) Mdf1p is localized in both the nucleus and cytoplasm. (C) Yeast two-hybrid assays show that Mdf1p interacts with Ī±2 protein in vivo. 1. P53-SV40 as the positive control; 2. Mdf1p fused with the DNA-binding domain of Gal4 (DB) as negative control; 3. MATĪ±2 fused with the activation domain of Gal4 (AD) as negative control. 4. Mdf1p-MATĪ±2 interaction. Four independent clones were patched in the selective plates. (D) Pull-down assays prove that Mdf1p physically binds to MATĪ±2 protein in vitro. Purified His-tagged Mdf1p was incubated with MATĪ±2 fused to GST or with GST alone, and was detected by western blotting using mouse anti-6XHis-tag monoclonal antibody. Twenty percent of purified His-tagged Mdf1p used for each pull-down reaction is shown as input.

To obtain direct evidence on Mdf1p-MATĪ±2 interaction, we employed the yeast two-hybrid assays. Mdf1p was fused with the DNA-binding domain of Gal4 (DB) and MATĪ±2 protein was fused with the activation domain of Gal4 (AD). The yeast two-hybrid assay results suggest that Mdf1p can interact with MATĪ±2 protein in vivo (Figure 4C). In vitro GST pull-down assays were carried out to further substantiate the results of yeast two-hybrid assays. MATĪ±2 was expressed as a GST-fusion protein in E. coli, while Mdf1p was expressed as a His-fusion protein in yeast. Figure 4D shows that MATĪ±2 and Mdf1p physically interact with each other in vitro. Overall, both yeast two-hybrid and GST pull-down assays support the Mdf1p-MATĪ±2 interaction hypothesis.

Mdf1p and MATĪ±2 cooperatively bind to the haploid-specific gene operator

Having established that Mdf1p and MATĪ±2 can interact, we next investigated whether MATĪ±2 and Mdf1p co-bind to the regulatory DNA elements that control haploid-specific genes. ChIP assays were carried out for 10 known haploid-specific genes (MATĪ±1, STE4, STE5, STE18, FUS1, FUS2, FUS3, GPA1, SST2, and RME1) 20 using the antibody against Mdf1p -6XHis. Except STE18, our ChIP experiments successfully recovered the promoters of all the genes (Figure 5A), indicating that Mdf1p specifically contacted with the haploid-specific genes. We further used in vivo electrophoretic mobility shift assays (EMSAs) to confirm this result. In the traditional MATa1-MATĪ±2 model, the role of al and Ī±2 proteins is to recognize a roughly 20-bp motif, called the haploid-specific gene (hsg) operator, and suppress the expression of the cognate gene, and the recognition of the hsg operator requires both al and Ī±2 proteins 21. We chose the most conserved reported motif 21 labeled with biotin to test the affinity by the Mdf1p/MATĪ±2 complex, assuming Mdf1p takes the role of MATa1. Consistently, in our EMSA experiments no detectable binding to the binding motif was observed when only Mdf1p (Figure 5B, lane 8) or MATĪ±2 protein (Figure 5B, lane 9) was contained in the nuclear extracts, whereas Mdf1p and MATĪ±2 cooperatively bound to the biotin-labeled binding motif using the nuclear extracts prepared from M+Aāˆ’ strain (Ī± cells) (Figure 5B, lane 3). These results indicate that Mdf1p and MATĪ±2 also function in a mutually dependent manner. More importantly, the Mdf1p -6XHis antibody and MATĪ±2-Flag tag antibody separately supershifted the band in an antibody concentration-dependent manner (Figure 5B, lanes 4-7), indicating that Mdf1p and MATĪ±2 are indispensable components in the binding complex. The next and even more challenging task is to look for the precise binding site of Mdf1p within the hsg operator. We tried a series of mutated probes labeled with biotin. When we mutated the Ī±2-half sites and the linker between a1 and Ī±2-half sites, no shift bands could be observed (Figure 5C, lanes 1 and 2), indicating that as in a1-Ī±2 heterodimer, the Ī±2 protein in Mdf1p-Ī±2 heterodimer still binds to Ī±2-half sites, and the linker between the two halves is also crucial in aiding the binding. However, contrary to the simple expectation, the position of Mdf1p on the hsg operator is not in the original a1-half site (Figure 5C, lane 3), but slightly moves four nucleotides away from the a1-half site (Figure 5C, lane 4). These results (Figure 5D), combined with other data, strongly support that Mdf1p and MATĪ±2 proteins are bound to each other and jointly regulate those haploid-specific genes. From the convergent evidence obtained so far, a model for the function of Mdf1p in the mating pathway can be drawn as shown in Figure 5E. Through binding MATĪ±2, the central component of the yeast mating pathway, and cooperatively with MATĪ±2 targeting the hsg operator, Mdf1p inhibits MATĪ±1 and other haploid-specific genes from opening the MAPK pathway which is responsible for triggering intracellular mating signal transduction, and consequently decreases the mating efficiency of S. cerevisiae.

Figure 5
figure 5

Mdf1p and MATĪ±2 cooperatively bind to the haploid-specific gene operator. (A) Results of ChIP show that Mdf1p-MATĪ±2 can bind to the promoters of haploid-specific genes. The final DNA extracts were amplified using a pair of primers that cover the 200-bp upstream flanking region of each haploid-specific gene. IN, input; IP, immunoprecipitation; āˆ’Ab, control for non-specific binding in the absence of antibody; untag, untagged yeast as a negative control. 1. MATĪ±1, 2. STE4, 3. STE5, 4. FUS1, 5. FUS2, 6. FUS3, 7. GPA1, 8. SST2, 9. RME1, 10. STE18. (B) Separate and cooperative DNA binding activities of Mdf1p and MATĪ±2 to the 3ā€² biotin-labeled, double-stranded hsg operator probe were measured by EMSA. Nuclear extracts of M+Aāˆ’ (Ī± type) cells with Mdf1-6XHis tag and MATĪ±2-Flag tag (lanes 1-7) were used to analyze the DNA binding activities of the MDF1-MATĪ±2 complex. The specificity of the binding is demonstrated by competition with a 200-fold excess of the cold probe (lane 1) and unrelated DNA (lane 2) compared with normal shift band (lane 3). Lanes 8-10 represent nuclear extracts of Māˆ’A+ cells (Ī± type), M+Aāˆ’ cells (a type) and no nuclear extracts. Lanes 4 and 5 represent super-shift experiments after the addition of 0.5 and 1 Ī¼g His tag monoclonal antibody, whereas lanes 6 and 7 represent super-shift experiments after the addition of 0.5 and 1 Ī¼g Flag tag monoclonal antibody. (C) EMSAs using a series of mutated biotin-labeled probes show that the Ī±2 protein in Mdf1p-Ī±2 heterodimer still binds to Ī±2-half sites, and the linker between the two halves is crucial in aiding the binding. The position of Mdf1p on the hsg operator is not in the original a1-half site, but slightly moves four nucleotides (Mdf1p-half site) away from the a1-half site. (1) Probe with Ī±2-half sites mutated, (2) probe with linker mutated, (3) probe with a1-half sites mutated, (4) probe with four nucleotides flanking the a1-half sites mutated, (5) probe without mutation, (6) cold competition. (D) Model for the DNA binding features of a1-Ī±2 and Mdf1p-Ī±2 heterodimers. In a1-Ī±2 heterodimer, a1and Ī±2 proteins bind a1-half (pink) and Ī±2-half (blue) sites, respectively; in Mdf1p-Ī±2 heterodimer, Mdf1 and Ī±2 proteins bind Mdf1p-half (green) and Ī±2-half (blue) sites, respectively. (E) A model for the functions of Mdf1p and Adf1p in the mating pathway. Mdf1p and MATĪ±2 are physically cross-linked to the promoters of haploid-specific genes and MATĪ±1 which is in charge of opening Ī±-specific genes, thereby repressing the MAPK pathway which is responsible for triggering a series of physiological changes in preparation for mating. To prevent the concomitant side effect of Mdf1p, the expression of MDF1 is negatively regulated by the transcriptional repressor Adf1p encoded by its antisense strand.

Computational and experimental analyses strongly support that MDF1 is most likely a de novo gene in S. cerevisiae while ADF1 is conserved across species

The next important question deserving a close investigation is the origination process of both genes. We searched the UniRef90 protein dataset using PSI-BLAST, and found that ADF1 is conserved in all the sequenced members of hemiascomycete subdivision of fungi except the most distant clade, Yarrowia lipolytica (Supplementary information, Figure S2). Undoubtedly, ADF1 originated at least before the separation of S. cerevisiae with the CTG clade 300 million years ago (mya) 22 (Figure 6A). By contrast, MDF1 does not have significantly homologous ORF in all the other organisms except two short truncated ORFs in the close relatives S. bayanus and S. mikatae (Supplementary information, Figure S3). The flanking genes of MDF1 in S. cerevisiae, KRR1 and YCL057C-A, are both conserved across fungi. This gene order is maintained in all 13 sequenced hemiascomycete species from S. cerevisiae to Ashbya gossypii (Supplementary information, Figure S4). When we manually aligned the intergenic region between these two flanking genes in other species, this region could not encode for proteins in any other species, due to the presence of multiple stop codons and frame-shifting indels (Supplementary information, Figure S5). However, it is still theoretically possible that the homologous sequences could maintain some ancestral function in other species in a way that circumvents the stop codons by nonsense suppression (read-through of stop codons), or that the truncated ORFs were functional. To test these alternative hypotheses, first we tested whether MDF1 and ADF1 are transcribed and further translated in other sensu stricto species. Our strand-specific RT-PCR experiments showed that MDF1 only expresses in S. cerevisiae, while ADF1 expressed constantly in the sensu stricto group (Figure 1A). When His-tags were fused to the 3ā€²-ends of homologous sequences of MDF1 in S. paradoxus, S. mikatae and S. bayanus, and to the 3ā€²-ends of short ORFs in S. mikatae and S. bayanus, no protein could be detected under the same conditions (Figure 6B). Second, to specifically test the possibility of read-through of stop codons, we further replaced MDF1 in S. cerevisiae with the homologous sequences in S. bayanus, S. mikatae and S. paradoxus containing stop codons and indels (Supplementary information, Figure S5); no reduction in mating efficiency was observed in all these substituted strains (Figure 6C). We also experimentally replaced the ACA in the 3ā€²-end of MDF1 in S. cerevisiae with TGA (stop codon in S. bayanus and S. kudriavzevii) (Supplementary information, Figure S5), and observed that the truncated Mdf1p was unable to cause the mating defect (Figure 6C). All the analyses fit the hypothesis that MDF1's homologous sequences in other species are non-coding and the intact ORF of MDF1 is indispensable for acting as a regulator of mating processes.

Figure 6
figure 6

Comparative genomics and experimental analyses support that MDF1 is most likely a de novo originated gene in S. cerevisiae while ADF1 is conserved across species. (A) The phylogenetic tree 24 illustrates our hypothesis that MDF1 emerged specifically in S. cerevisiae, while ADF1 is conserved in all the sequenced members of hemiascomycete subdivision of fungi except the most distant clade, Yarrowia lipolytica, which almost completely lost synteny relation with S. cerevisiae. The red and yellow stars denote the generation events for MDF1 and ADF1, respectively. (B) Western blotting results showed that there is no Mdf1p in other sensu stricto group species. His-tags were fused to the 3ā€²-ends of homologous sequences of MDF1 in S. paradoxus, S. mikatae and S. bayanus, and to the 3ā€²-ends of shorter ORFs in S. mikatae and S. bayanus, respectively. Tagged Mdf1p in S. cerevisiae was used as positive control, tubulin as loading control. 1. S. cerevisiae, 2. S. paradoxus, 3. S. mikatae, 4. S. bayanus, 5. Shorter ORF in S. mikatae, 6. Shorter ORF in S. bayanus. (C) Mating assays prove that the existence of the intact MDF1 is required for acting as a regulator of mating process. When we replaced MDF1 in S. cerevisiae, respectively, with the homologous sequences of S. bayanus, S. mikatae, and S. paradoxus, or replaced the ACA of MDF1 in S. cerevisiae with TGA (stop codon in S. bayanus and S. kudriavzevii), no mating defect was observed. WT, S. cerevisiae wild type; M+Aāˆ’ (C), M+Aāˆ’ in S. cerevisiae; M+Aāˆ’ (P), M+Aāˆ’ (M), M+Aāˆ’ (B), MDF1 was replaced with homologous sequences of S. paradoxus, S. mikatae, and S. bayanus, respectively; MstopAāˆ’ (C), ACA of MDF1 in S. cerevisiae was replaced with TGA.

However, on account of reported widespread multiple gene losses in yeast after the whole genome duplication (WGD) event 23, more proof is still desired to distinguish between evolutionary innovation and multiple losses in evolution of MDF1. Logically, if MDF1 was an old gene lost in other species, we would have to assume at least nine independent losses in 13 sequenced hemiascomycete lineages based on the phylogeny (Figure 6A). This is in sharp contrast to the fact that most gene-loss events were confined to duplicated copies after whole genome duplications, which was after the split of the lineage leading to S. cerevisiae from K. lactis about 100 mya 24, and that the most extreme and rare multiple gene-loss cases only have independent gene losses in three or four lineages 25. In addition, we reconstructed the ancestral consensus sequence of the region that corresponds to S. cerevisiae's MDF1 gene based on the sequences from the sensu stricto species, and found that there were at least two stop codons and two frames-shifting indels in the common ancestor (Supplementary information, Figure S5), indicating that it is unlikely that the MDF1 gene was lost in all the other species but remained intact only in S. cerevisiae. Overall, all the above comparative genomics and experimental data favor the hypothesis that MDF1 evolved through de novo origination rather than multiple losses or extension of the ancestral short functional sequences.

Discussion

New mechanism of sense-antisense interaction

One of our remarkable findings is that the way in which Adf1p regulates MDF1 fits none of the known sense-antisense interaction mechanisms, i.e., RNAi, transcription interference (TI), or antisense RNA-induced histone deacetylation. The following three pieces of evidence demonstrate that the traditional explanations for the sense-antisense interaction cannot be applied to the Adf1p case. First, when we introduced a stop codon into the N-terminus of Adf1p by site-directed mutagenesis to construct the M+Aāˆ’ strain, Adf1p was eliminated, but not the ADF1 RNA (data not shown). If the regulation was RNA-dependent, the M+Aāˆ’ strain in which both RNAs existed should act like the wild type, but it is simply not the case. Thus, RNAi and antisense RNA-induced histone deacetylation can be ruled out. Second, the overexpressed ADF1 on the plasmid which could completely abolish the sense (i.e. MDF1) transcript does not overlap with the chromosomal ADF1. Therefore, TI can also be dismissed. Together with our ChIP results of Adf1p, we put forward a new sense-antisense regulation mechanism, in which Adf1p represses the transcription of MDF1 by binding to the promoter region of MDF1. This new finding will certainly widen our understanding of gene regulation and deepen our comprehension on how species with compact genomes use genetic materials economically.

If MDF1 is a de novo gene and ADF1 is conserved across all hemiascomycetes, what function did ADF1 play prior to the origination of MDF1? In order to give some preliminary hints about the genuine function of ADF1, we deleted ADF1 in S. paradoxus, which does not possess a functional MDF1, and observed defective growth (Supplementary information, Figure S6A). We also sequenced many DNA fragments obtained through ChIP for Adf1p in S. cerevisiae. In addition to precipitating the MDF1 promoter, we also obtained the promoters of a number of other old genes that are unrelated to mating (Supplementary information, Table S3). Some of these genes with multiple hits in our shotgun-clone sequencing take roles in pre-rRNA processing, cell wall formation or mitochondrial morphology. Therefore, ADF1 should have ancestral functions as a transcription factor and was later recruited to repress MDF1 in S. cerevisiae. More studies are needed to address the detailed original functions of ADF1, which will shed further light on the evolution of pathways.

MDF1 is an unprecedented example for the de novo origination of a protein-coding gene, leading to additional novel gene function and pathway evolution

Various mechanisms underlying gene origination have been revealed in some genes reported recently 7. So far only a few Drosophila new genes have received evidence for possible functions, such as Jingwei and Sphinx 26, 27. One of the most striking findings in this study is that MDF1, most likely generated de novo from a non-coding sequence, plays very important roles in two fundamental biological processes, namely mating and growth. To our knowledge, this is the first study to provide solid evidence that the de novo originated gene can truly encode a protein and play important roles in basic biological processes.

Moreover, the evolution of the intricate pathway upon which natural selection acts is a central and long-standing issue in evolutionary studies. So far, no new gene-involved pathway has ever been reported. Here we present appealing evidence that a de novo originated gene MDF1 can be integrated into the yeast mating pathway at the farthest upstream position. The uniqueness of MDF1 lies not only in a novel association with a fundamental pathway but also in the position where MDF1 has been recruited in, i.e., MDF1 impacts the mating pathway from the very beginning by binding the initiator of the mating process, although it seems more acceptable for a newly evolved gene to be recruited at the downstream nodes of a pathway. Our analyses on MDF1 enriched our understanding of pathway evolution.

Roles of Mdf1p in mating and growth of S. cerevisiae and implications on evolution of the baker's yeast

Yeasts can reproduce both sexually and asexually (facultative sex); selective forces might have favored either vegetative fitness or mating ability under different conditions and a negative correlation between these two traits might exist 28. It is one of nature's wonders to recruit a new component Mdf1p into the mating pathway to make yeast better able to balance the gain and cost of these two physiological phenomena. In benign condition, especially after the haploids' recovery from growth arrest under unfavorable conditions, vegetative proliferation is advantageous in rapid resource consumption and Mdf1p shuts down the mating pathway to limit the cost of mating, and thus S. cerevisiae is at a selective advantage relative to their more efficiently mating but slower-growing competitors; while in stressful condition, mating is favorable and MDF1 is suppressed by Adf1p to gain the benefit of sexual reproduction. The new regulatory circuit involving this sense-antisense gene pair might have aided S. cerevisiae in exquisitely adapting to the changing environment.

When the mating pathway is stimulated by a pheromone secreted by a nearby cell of the opposite mating type, yeast cells undergo a series of physiological changes in preparation for mating 15. These include arrest in the G1 phase of the cell cycle. Mdf1p is able to promote growth and decrease the mating efficiency of S. cerevisiae simultaneously. One possible connection between growth and mating is that by binding MATĪ±2 protein and further silencing the downstream haploid-specific genes, which sends the fictitious signal of diploids, Mdf1p may push yeast cells away from cell cycle arrest and thus accelerate mitotic cell growth. This hypothesis is consistent with a recent conclusion that a growth-rate advantage can be gained by losing signaling at multiple points in the mating pathway 29. However, it is noteworthy that the comprehensive molecular mechanism of Mdf1p promoting growth is still not clear. The GFP fusion protein assay showed that Mdf1p exists in both the cytoplasm and nucleuses (Figure 4B), while Mdf1p binding MATĪ±2 could only explain the nuclear localization of Mdf1p. Therefore, it is plausible to assume the existence of additional MATĪ±2-independent interacting factor(s) with Mdf1p in the cytoplasm. Our microarray and two-dimensional electrophoresis data show that many metabolic genes are influenced by Mdf1p (Figure 3A and Supplementary information, Figure S1B), but how these effects have happened remain unclear to us. Future studies are still needed to reveal the detailed pathway/network involved by Mdf1p to promote growth.

However, mad growth without orchestrating the internal and external conditions is not always beneficial to yeast cells. In fact, in contrast to the superiority of the M+Aāˆ’ strain in rich medium, Mdf1p is unable to promote growth in nonfermentative medium (Supplementary information, Figure S6B), in which sexual reproduction is advantageous 28. Hence, the non-mating haploid M+Aāˆ’ strain may not be very good at coping with harsh nutritional condition owing to the low efficiency of sporulation which is a normal strategy to resist against adverse circumstances. Besides, as revealed by microarray data, M+Aāˆ’ strain seems to exhibit some defects in DNA damage repairing due to the inhibited transcription of some DNA damage regulators (Figure 3A). Therefore, MDF1 should be under stringent control to avoid its side effect. Opportunely, MDF1 recruited its antisense gene ADF1 as a negative regulator. Interestingly, as demonstrated by previous microarray analysis, the expression level of ADF1 is fluctuating 30. This intriguing pattern hints that the regulatory circuit should be dynamic in response to the change of physiological condition in wild type. In future, we are anticipating that more studies on both MDF1 and ADF1 will lead us towards an integrated understanding on how MDF1 and ADF1 regulate growth and other biological processes.

Materials and Methods

Competition experiment

The deletion strains including Māˆ’Aāˆ’, M+Aāˆ’, Māˆ’Aāˆ’ in the background of S. cerevisiae and Aāˆ’ in S. paradoxus contain genetic markers with resistance to geneticine conferred by the kanMX4 cassette inserted into a deleted chromosomal MDF1 locus. To obtain differently marked competitors, we introduced nourseothricine resistant to the wild types S. cerevisiae (Ī± cells) and S. paradoxus (Ī± cells) by inserting natMX4 into the HO locus. Previous experiments established that these markers are neutral compared to unmarked wild-type strains 31. The competition experiments were carried out as follows: equal volumes of overnight cultured competing pairs were mixed. After 24 h competition, the mixed cultures were printed onto two selective agar media with geneticine or nourseothricine added. The resulting surface cultures were photographed and the clone numbers were counted after 48 h of incubation at 30 Ā°C.

Mating efficiency test

The efficiency of mating was determined as follows (modified from Hartwell 32): cells were grown in YPD broth to a density of 3 Ɨ 107 cells per ml. The Ī±-cell cultures to be tested were mixed 100:1 with the a-cell cultures at room temperature. The mating cultures were spread on selective medium (Metāˆ’ and Lysāˆ’) at 30 Ā°C to determine the number of diploids and on a different selective medium (Metāˆ’) to determine the number of haploids. The mating efficiency is defined as the number of diploids observed on the first selective medium divided by the number of haploids observed on the second selective medium.

Yeast two-hybrid assay

All procedures essentially followed the Yeast Protocols handbook (Clontech). Briefly, the coding sequence of MDF1 was fused with the DNA-binding domain of Gal4 (DB) and the coding sequence of MATĪ±2 was fused with the activation domain of Gal4 (AD). Then these two plasmids were co-transformed to identify interaction or transformed separately as negative controls to the host strain Y190 (MATa, gal4-542, gal80-538, his3, trp1-901, ade2-101, ura3-52, leu2-3,112, URA3::GAL1-LacZ, Lys2::GAL1-HIS3cyhr). After selection on SDā€“Trpā€“Leuā€“His plates, 5-Bromo-4-chloro-3-indolyl-Ī²-D-galactoside (X-gal) was added to evaluate the strength of interaction.

GST pull-down assay

The MATĪ±2 coding sequence with the stop codon was cloned into pGEX-4T-1 vector to be a GST-MATĪ±2 plasmid. The GST-MATĪ±2 plasmid was transformed into the E. coli strain BL21. The expression of the fusion protein was induced by adding IPTG with a final concentration of 1 mmol/ml and incubated at 16 Ā°C for 4 h. After lysis of the bacterial cells by sonication, GST or GST-MATĪ±2 fusion protein was immobilized on glutathione-Sepharose 4B beads according to the manufacturer's (GE Healthcare) instruction. The beads were washed three times with cold PBS. 100 OD 6XHis-tagged MDF1 overexpressed yeasts were lyzed in 500 Ī¼l of lysis buffer (50 mM Tris-HCl (pH 7.5), 150 mM NaCl, 1 mM EDTA, 1 mM PMSF, protease inhibitor cocktail (Roche), 0.2 mM Na3VO4, 100 mM NaF, 0.2% NP-40) by glass bead beating and centrifuged at 12 000 Ɨ g for 5 min at 4 Ā°C. 200 Ī¼l of supernatant was incubated with 20 Ī¼l of GST or GST-MATĪ±2 immobilized glutathione-Sepharose 4B beads overnight at 4 Ā°C. After incubation, the beads were washed with lysis buffer four times. The bound proteins were analyzed by western blotting using anti-6XHis tag antibody (R&D Systems).

Chromatin immunoprecipitation assay

100 ml of cells overexpressing Mdf1p or Adf1p (2.0 Ɨ 107 cells/ml) was crosslinked with 2% formaldehyde for 15 min at room temperature. Glycine was added to a final concentration of 250 mM, and the incubation continued for an additional 5 min. The suspension was sonicated seven times for 10 s each, with the amplitude set at 30% using an ultrasonic processor (Sonic ultracell). Samples were incubated on ice for 2 min between sonications. The suspension was clarified by centrifugation for 5 min at 10 000 Ɨ g at 4 Ā°C in a microcentrifuge. 1 Ī¼l of RNase (10 Ī¼g/Ī¼l) was added to the samples, and they were incubated for 30 min at 37 Ā°C. Afterwards, sheared chromatin was purified using QIAquick spin columns (Qiagen). Then 250 Ī¼l of supernatant was incubated with 15 Ī¼l of anti-His monoclonal antibody (R&D Systems). The promoter primers of 10 haploid-specific genes (MATĪ±1, STE4, STE5, STE18, FUS1, FUS2, FUS3, GPA1, SST2, RME1) residing in the 200-bp upstream flanking region of each gene were used for the PCR analysis. The MDF1 promoter primers used were as follows: MDF1-Chip-Fwd, 5ā€²-TAG TCT TAA GCG ACG ATG CTT TAT-3ā€², and MDF1-Chip-Rev, 5ā€²-CAG AAA AAT CAA AAA CAA ACG ACA G-3ā€², which flank the āˆ’150 bp to +29 region of the MDF1 gene.

Electrophoretic mobility shift assay (EMSA)

EMSA was performed using nuclear extracts from M+Aāˆ’ (Ī± type) cells coexpressing Mdf1-6XHis tag and MATa2-Flag tag as described previously 33 with modified extraction buffer (HEPES, pH 8.0, 20 mM, NaCl 400 mM, EDTA 1 mM, DTT 1 mM, NP-40 1%, glycerol 10%, protease inhibitor cocktail). The oligonucleotide probes of the hsg operator labeled with 3ā€²-biotin are listed in Supplementary information, Table S4. For antibody supershift assays, anti-6XHis tag monoclonal antibody or anti-Flag tag monoclonal antibody was incubated for 30 min followed by EMSA procedures 21 using North2SouthĀ® Chemiluminescent Hybridization and Detection Kit (PIERCE) for detection.

(Supplemental materials and methods are depicted in the Supplementary information, Data S1)