With the completion of the genome sequence of the budding yeast Saccharomyces cerevisiae in April 1996 (ref. 1), a eukaryotic organism could be analysed on a genomic scale for the first time. The challenge became one of understanding the roles of the approximately 6,000 gene products and how they interact to create a eukaryotic organism. However, one-third of the predicted yeast open reading frames (ORFs) are still classified as proteins of unknown function2. Genomic technologies have been developed that focus predominantly on the use of DNA array approaches to measure, for example, the expression of large sets of genes. As comparable strategies for protein analysis are not available, we initiated two types of systematic study to produce additional information that can place yeast ORFs within a biological context, with the goal of understanding their functional roles. We chose to use the yeast two-hybrid system because it can identify pairs of proteins that physically associate with one another3 and because two-hybrid screens are simple, sensitive and amenable to high-throughput methods. The first genomic analysis using the two-hybrid system centred on the T7 bacteriophage4. Large complexes in S. cerevisiae have also been analysed using this method of detecting protein–protein interactions5,6. Here we present the results of two complementary strategies using the two-hybrid screen to identify protein–protein interactions among the predicted ORFs of S. cerevisiae. A bioinformatics platform for the analysis of this data set is publicly available at

A protein array of activation-domain hybrids

To examine protein activity in a format that allows the assay of every predicted ORF, we constructed an array of hybrid proteins. At least two general types of protein array may be envisioned: those composed of living transformants, as described here, with each protein expressed in a form that allows expedient assay of the host cells; and those composed solely of the purified proteins7. The two-hybrid array used here is a set of yeast colonies derived from about 6,000 individual transformants. The transformation event inserts one of the yeast ORFs into a Gal4 transcription-activation domain vector to create a hybrid protein. To enable rapid, large-scale transformation, we generated those ORFs as a set of polymerase chain reaction (PCR) products with 70 base-pair sequences at their 5′ and 3′ ends that precisely matched sequences in the activation domain vector pOAD8. These sequences allowed highly efficient recombination between the ORFs and the linearized vector. After transformation of a yeast two-hybrid reporter strain9, we pooled two colonies from each transformation plate to constitute a single array element (Fig. 1a), with the entire array contained on sixteen microassay plates of 384 colonies each.

Figure 1: The two-hybrid assay carried out by screening a protein array.
figure 1

a, The array of 6,000 haploid yeast transformants plated on medium lacking leucine, which allows growth of all transformants. Each transformant expresses one of the yeast ORFs expressed as a fusion to the Gal4 activation domain. b, Two-hybrid positives from a screen of the array with a Gal4 DNA-binding domain fusion of the Pcf11 protein, a component of the pre-mRNA cleavage and polyadenylation factor IA, which also consists of four other polypeptides36. Diploid colonies are shown after two weeks of growth on medium lacking tryptophan, leucine and histidine and supplemented with 3 mM 3-amino-1,2,4-triazole, thus allowing growth only of cells that express the HIS3 two-hybrid reporter gene. Three other components of factor IA, Rna14, Rna15 and Clp1, were identified as Pcf11 interactors. Positives that do not appear in Table 2 were either not reproducible or are false positives that occurred in many screens.

A set of 192 DNA-binding domain hybrids was similarly generated by transformation into a strain of opposite mating type of PCR products with a Gal4 DNA-binding domain vector. To screen for protein interactions, we mated a transformant containing one of the DNA-binding domain hybrids to all of the transformants of the array, selecting diploids using markers carried on the two-hybrid plasmids. The diploids were then transferred to selective plates deficient in histidine, and colonies positive for the two-hybrid reporter HIS3 gene were identified by their positions in the array (Fig. 1b). For each of the 192 screens, we typically obtained on the order of 1–30 positives. However, only around 20% of these positives were reproduced in a second screen of the array (Table 1a). Although the exact causes of this variability are not known, they appear to include an infrequent and protein-specific rearrangement of the DNA-binding domain plasmid to generate proteins that activate transcription on their own. As a consequence of this variability, we scored as putative interacting partners only those proteins that were identified in two independent screens, even though this criterion also resulted in the omission of some known interactions found only once. Overall, 87 of the 192 DNA-binding domain hybrids screened were identified in a putative protein–protein interaction, resulting in 281 interacting protein pairs (Table 1a).

Table 1 Summary of experimental results

High-throughput screens of an activation-domain library

As an alternative method of genomic two-hybrid analysis, we developed high-throughput screens based on a library made by pooling transformants containing the roughly 6,000 potential ORFs fused to the Gal4 activation domain. The hybrid proteins were derived similarly to those in the two-hybrid array, with each of the potential ORFs cloned separately into a Gal4 DNA-binding domain vector in addition to the Gal4 activation domain vector. The same PCR products and recipient plasmids were used to generate two collections of transformants, each consisting of 64 barcoded 96-well plates. Of the 6,144 ORFs, 5,345 (87%) were successfully cloned into both plasmids (Table 1b). Transformants from the Gal4 activation domain collection were then pooled to form an activation-domain library. To screen for protein interactions, we mated each DNA-binding domain hybrid transformant in duplicate to the activation domain library. Mating mixes were transferred to selective plates to select diploid cells that expressed interacting pairs and activated both reporter genes (URA3 and lacZ ). Experiments were conducted in 96-well assay plates using semi-automation and computerized sample tracking to perform such large-scale transformation and mating reactions quickly.

Overall, 817 yeast ORFs (15% of the successfully cloned yeast ORFs) were identified in a putative protein–protein interaction by this approach, resulting in 692 interacting protein pairs (Table 1b). To assess the coverage of the screens, we classified the interactions according to their frequency. Sixty-eight per cent of the putative interactions were identified in independent experiments (41%) or multiple times in a single experiment (27%). The remaining 32% of the interactions were identified only once during the screening process. This moderate coverage reflects the depth of the experiment determined by the number of colonies per mating submitted for sequence analysis. To complete this study in a timely and cost-effective manner, we selected only 12 colonies from each mating. This number was chosen to validate this genome-wide high-throughput approach as well as to generate significant data of scientific interest.

Comparison of the approaches

The array screens and the library screens gave different data sets (Table 2). Forty-five per cent of the 192 proteins used in the array screens yielded interactions, compared to 8% of the 5,345 potential ORFs in the library screens. Some of the difference in the number of proteins that resulted in positives with each approach is attributable to the nonrandom choice of proteins for the array screens (some categories of proteins, such as membrane proteins and metabolic enzymes, are less likely to yield interacting partners, whereas signalling proteins are more likely to). However, for proteins that identified at least one binding partner, the array screens gave an average of 3.3 positives per protein, whereas the library screens gave an average of 1.8. In addition, the 12 positive DNA-binding domain hybrids common to both screens yielded 48 putative interactions in the array screens and 14 in the library screens. Thus, although the library approach permits a much higher throughput, the array screens generate more candidate interactors. The higher yield of candidates in the array screens can be partially attributed to the stringency of the selection procedures; the His selection for 14 days in the array screens was less stringent than the Ura selection for 4 days in the library screens. More significantly, the pooling of activation-domain transformants may select against interactions that involve cells with reduced growth rate or mating ability, and sequence analysis of twelve positives may identify only strongly interacting pairs. In this regard, the array may facilitate the detection of interactions that result in very low reporter gene activation, in that a single positive array element on the two-hybrid selection plate may be composed of many small, slow-growing colonies. However, the array screens are much more labour- and material-intensive, and require several hours of robot time per screen, thus severely limiting the number of screens that can be performed.

Table 2 List of the 957 interactions identified in both two-hybrid screens

We were interested in examining the putative protein–protein interactions identified in these screens in reference to their functional roles according to the yeast protein database (YPD) classification10. Representative proteins from 41 of the 43 YPD categories were identified in the screens (Table 3). Of the 1,004 active proteins, 412 fell into the ‘unclassified’ category. These 412 proteins yielded 509 distinct interactions, of which 164 (32%) were between proteins with no functional classification. This observation indicates that there may be a significant number of as yet undiscovered pathways and/or complexes that can be identified using systematic approaches. Results from both approaches were also compared with a compilation of literature-cited interactions (Table 2 ). From the 957 putative interactions identified by both approaches, at least 109 have been previously reported by others using two-hybrid, co-immunoprecipitation, copurification or affinity column techniques2,10. That only a subset of previously described interactions has been detected in our work can be attributed to specific features of the screens: the exclusive use of full-length proteins as both DNA-binding and activation-domain fusions and the versions of the two-hybrid system used, which include Gal4 as the DNA-binding domain fusion protein and centromeric plasmids. Each of these components can affect the sensitivity of the assay11. Additionally, sequence analysis of 200 ORF constructs used in constructing the array indicates that 15% of the recombinant plasmids may lack insert, another 3% may contain frameshifting errors, and about 5% of all colonies failed to grow. Thus, we estimate that the array contains 85–90% of the yeast ORFs, given that each element is composed of two individual transformants.

Table 3 Interactions grouped by protein ‘cellular roles’ (as classified by the Yeast Protein Database, YPD (ref. 9) [a])

Results of the systematic two-hybrid screens

To examine the large number of potential interactions, we developed a new bioinformatics platform which, along with the complete set of data generated by these screens, is publicly available at The software allows users to search for information on putative protein interactions identified in the screens or reported in the literature, to perform sequence analyses and view the results, to extend interactions to construct pathways and to view the homologues of the yeast genes in a number of species using the three-dimensional homologue viewer (see Fig. 2).

Figure 2: Data analysis software.
figure 2

a, The putative interaction identified between Mad3 and Bub3 which connects the spindle checkpoint complex37 and the microtubule checkpoint complex38. Yeast proteins are shown as yellow spheres with the name of each gene indicated. Interactions in Figs 2 and 3 are shown as black lines (from literature), solid green lines (from library screens in independent matings), dashed green lines (from library screens in one mating), purple (from array screens) and blue lines (from literature and screens). Arrows point away from the protein used as the binding-domain clone when the interaction was identified. Grey nubs indicate other proteins that interact with that protein but have not been expanded. b, Same pathway shown using the homologue viewer; the pathway can be rotated and homologous proteins in human, mouse, rat, Drosophila, Caenorhabditis elegans and Escherichia coli can be displayed. Known interactions between proteins in other species can be viewed: for this pathway interactions between the human proteins hMad3/hBub3, hBub3/hBub1, hBub1/hMad1, hBub3/hMad1 and hCDC20/hMad2, shown in black, are reported in the literature39,40,41. The distance of each species protein icon (shown in key) from the yeast proteins (shown in yellow) represents the amount of overall similarity between the species. The size of the protein icon in each corresponding species indicates the amount of homology with the specific yeast protein. As the human homologues are highlighted in this example, their gene names are shown.

First, by using systematic approaches to analyse the yeast genome, we could identify interactions that place functionally unclassified proteins into a biological context. For example, two proteins of unknown function, YGR010W and YLR328W (77% identical), were observed to interact with each other. Additionally, both proteins bind to ornithine aminotransferase (Car2), indicating that they may be involved in arginine metabolism. Human ornithine aminotransferase (OATase) can complement an ornithine aminotransferase-deficient strain of S. cerevisiae , and mutations in human OATase cause gyrate atrophy of the choroid and retina12.

Data from this study provide evidence of links between two proteins involved in autophagy, Apg13 and Apg1, and proteins of the Cvt (cytoplasm-to-vacuole targeting) pathway, Lap4, Vma22 and Vma6. Autophagy is a degradation pathway used under conditions of nutrient stress to nonselectively recycle cytoplasmic proteins and organelles to their constituent components, whereas the Cvt pathway is a biosynthetic pathway that transports the vacuolar enzyme aminopeptidase I (API, encoded by LAP4) specifically to the vacuole13. Several mutations in the Cvt pathway (cvt) and autophagocytosis ( aut and apg) are allelic, indicating that both pathways may utilize some of the same molecular components14. Our study implicates a number of ORFs encoding proteins of unknown functions as potential components of autophagy (Fig. 3a). As several of the genes altered in apg, aut and cvt mutants have not yet been cloned, ORFs found in these interactions could be examined to determine whether they encode any of these genes. This study has also shown Lap4 to be a self-interactor, corroborating previous evidence that Lap4 assembles into a dodecamer15, and the interaction between Apg1 and Apg13 lends support to previous genetic evidence indicating that APG1 may be a high-copy suppressor of apg13 (ref. 16).

Figure 3: Expanded pathways shown using the software as described in the text.
figure 3

a, Autophagy pathway illustrating potential novel interactions that place functionally unclassified proteins in a biological context. b, Potential interactions identified by screens of the Sm motif-containing proteins Lsm2, Lsm4 and Lsm8. c, The Clb/Cdc28/Cks1 complex shows novel interactions between proteins involved in the same biological function. d, The Msh5 pathway illustrates novel interactions that link biological functions together into greater cellular processes.

Second, genome-wide two-hybrid approaches offer insight into novel interactions between proteins involved in the same biological function. For example, we screened four proteins (Lsm2, Lsm4, Lsm8 and Prp11) that are known or suspected to be involved in RNA splicing and that had been previously analysed or identified in another systematic set of two-hybrid screens5 using a random library of activation-domain hybrids. Three of these contain the Sm1 and Sm2 motifs found in a core set of proteins associated with small nuclear RNAs (snRNAs) involved in splicing17,18; the yeast Sm proteins are homologous to the eight common Sm proteins identified in mammalian cells. Screens of the three Sm proteins identified other Sm proteins ( Table 2, Fig. 3b): the D1 homologue Lsm2 binds B (Lsm1), D2 (Smd2), E (Lsm5), F (Lsm6) and G (Lsm7) homologues; the D3 homologue Lsm4 binds B, F and G homologues and Lsm8; and Lsm8 binds D1, E, F and G homologues. These results support a proposed model19 based on crystallographic, biochemical and genetic data, but also include interactions not predicted by the model and which might reflect exchangeability of proteins within the complex, additional contacts within or between subcomplexes, or bridging effects in the two-hybrid assay.

In yeast, diverse cyclins bind to Cdc28 in a coordinated manner to modulate its kinase activity during the cell cycle. The B-type cyclins are critical in the induction of bipolar mitotic spindle formation20. Each of the B-type cyclins, Clb1, Clb2 and Clb3, has been observed to be in a complex with Cks1 and Cdc28. Our observation of two-hybrid interactions between Cks1 and each of Clb1, Clb2 and Clb3 indicates that the kinase activity of Cdc28 could be regulated by cyclin Bs through their interaction with Cks1 (Fig. 3c).

Third, novel interactions that connect biological functions into larger cellular processes can be gleaned from our screens. The Sm proteins Lsm2, Lsm4 and Lsm8 identified ribosomal protein S28 (producing a signal with both Rps28a and Rps28b, the nearly identical copies of this protein), which may reflect an unusual involvement of Rps28 in splicing, or of the snRNP proteins in translation or ribosome biogenesis (Fig. 3b). Unexpectedly, a characterization of the mammalian spliceosome complex by two-dimensional gel separation and mass spectrometry identified a different ribosomal protein, Rps4x, as a previously unknown spliceosome-associated protein21. The three Sm proteins also interacted with Dcp1, a messenger RNA-decapping enzyme, consistent with the role of Lsm1 (Spb8) in decapping22. Finally, a screen with Dcp1 found Rps28b and thus provides further evidence for a functional interplay between these proteins. Both Lsm2 and Lsm8 identified Mtr3, implicated in mRNA transport23, suggesting another possible link between splicing and other processes.

The meiosis-specific protein Msh5 is required for the resolution of crossovers during meiosis24. Meiotic recombination is initiated by double-strand breaks (DSBs), a prerequisite to crossover formation that is resolved in a structure called the synaptonemal complex. Mre11 is part of a complex that participates in DSB formation25. It is also known that Tid3 helps form the spindle pole body and interacts with Dmc1 (ref. 26), a protein that is required for the formation of the synaptonemal complex. We observed Msh5 to interact with both Mre11 and Tid3 ( Fig. 3d). These novel associations tie DSB formation and the resolution of crossovers with Msh5 as the linking protein.


We have detected novel interactions for proteins previously screened by other workers who used the two-hybrid assay with activation-domain libraries of randomly generated inserts. Some of these new partners may reflect the comprehensive nature of the array and library approaches or the requirement for a full-length protein to detect some interactions. However, most of our data represent new potential interactions for proteins that have not been previously searched in the two-hybrid assay. Of the new interactions, many seem credible on the basis of genetic or other criteria, whereas the relevance of others cannot be easily assessed. However, some reproducible two-hybrid signals are unlikely to reflect true interactions, based on the known properties of the proteins involved. Thus, in the absence of other data, the set of proteins derived from each of these screens should be viewed as potential positives, serving to motivate other experiments that confirm or eliminate particular proteins as plausible interactors.

As part of these studies, we have developed a protein array approach as a new method for systematic genome-wide analysis. Arrays of biomolecules possess unique advantages for the handling and investigation of multiple samples. They provide a fixed location for each element such that those scoring positive in an assay are immediately identified; they have the capacity to be comprehensive and of high density; they can be screened by high-throughput robotic procedures using small volumes of reagents; and they allow the comparison of each assay value with the results of many other identical assays. Moreover, chemically pure protein arrays can be constructed by generating proteins fused to an affinity tag and recovering them by cell lysis and affinity purification7. The yeast activation-domain array and library described here also allow the detection in vivo of nucleic-acid–protein and small-molecule–protein interactions, through the use of hybrid molecules27,28,29,30,31, in a manner similar to that for protein–protein interactions.

The publication of the complete genome sequence of Caenorhabditis elegans and the escalating efforts to complete the sequences of other genomes increase the need for high-throughput functional studies. The studies described here represent the first comprehensive biological screens that use the complete set of predicted ORFs from a eukaryotic organism. Both of the approaches could be scaled up for larger sets of proteins, such as those encoded by Caenorhabditis elegans or Drosophila melanogaster. The high-throughput library approach is reasonable to employ in order to complete a screen of all encoded ORFs for either of these organisms; however, the array approach, while much more time- and labour-intensive, would probably provide more positives. The bioinformatics platform we have developed can incorporate new data sets as they become available and compare results across species to identify conserved interactions. Systematically applying multiple high-throughput strategies should increase our understanding of yeast and other eukaryotic organisms.


Gap-repair cloning

We constructed transformants containing activation-domain hybrids by recombination32 of the linearized vector pOAD8 with PCR fragments corresponding to each of the yeast ORFs8. The DNA-binding domain vector pOBD2 was constructed by introducing 48 base pairs from pOAD8, encoding residues 866–881 of the Gal4 activation domain, into pOBD8 immediately 3′ to the DNA that encodes residue 147 of the Gal4 DNA-binding domain (S. McCraith and S.F., unpublished). This plasmid provides activation-domain sequences to allow cloning by recombination of the identical PCR fragments used to construct activation-domain hybrids. Transformation was carried out in a 96-well format using the lithium acetate procedure33. Yeast media were prepared as described34.

Generation of the array.

After transformation, cells were plated on 35-mm synthetic plates without leucine. The yeast recipient for the activation-domain hybrid plasmids was PJ69-4a (ref. 9), which is MATa. Two colonies from each transformation plate were pooled, cultured in liquid-leucine medium and transferred to solid-leucine medium in OmniTrays (Nalge Nunc International) by a Biomek 2000 Laboratory Automation Workstation (Beckman). We constructed an isogenic MATα derivative by switching the PJ69-4a strain using a plasmid carrying the HO gene. Selection for transformants carrying DNA-binding domain hybrids used synthetic plates without tryptophan.

Collections for the library screens.

Five microlitres from individual transformations were grown on selective medium lacking leucine or tryptophan for two days at 30 °C. Patches of transformants were manually transferred into individual wells on micro-assay plates for further use. The yeast recipients were YULH (MATa ura3-52 trp1 lys2 his3 leu2 gal4 gal80 GAL1–URA3 GAL1–lacZ) for the Gal4 DNA-binding domain fusion in pOBD2 and N106r (MATα ura3-52 his3 ade2 trp1 leu2 gal4 gal80 cyh2 lys2::GAL1–HIS3 ura3::GAL1–lacZ ) for the Gal4 AD fusion in pOAD.

Screening procedure

Array screening.

Transformants of the a and α reporter strains were mated on YEPD plates for 2–3 days at 30 °C by transferring 1 ml of an overnight culture of the MATα strain expressing a DNA-binding domain hybrid onto each of 16 OmniTrays using a 384-pin High Density Replicating Tool (Beckman), and then pinning the activation-domain array transformants of the MATa strain onto the same positions. Diploids were selected by transfer with the replicating tool to medium without leucine and tryptophan, followed by 2–3 days of further growth. For the two-hybrid selection, the diploids were transferred to medium without leucine, tryptophan and histidine supplemented with 3 mM 3-amino-1,2,4-triazole and scored after two weeks of growth at 30 °C. Further details about the array screening procedure are available at

Library screening.

Mating reactions were performed on 96-well filter plates (Millipore MAHV S45) by mixing 107 MATa cells (Gal4 DNA-binding domain fusion) with 5 × 106 MATα cells (activation domain library) from liquid cultures in complete medium (YPAD). After filtration, the 96-well filter plates were incubated overnight at 30 °C on rectangular YPAD solid medium plates. We collected cells from the mating mixes from each filter with sterile water. A semi-automated Zymark work station was used throughout the cloning and screening procedures. Diploids containing potential interactors were selected for 4 days at 30 °C on medium lacking leucine, tryptophan and uracil, and simultaneously screened for lacZ expression by the addition of X-gal. Each mating generated 5 × 105 to 106 original diploids per well, suggesting that the library was covered 80–160 times. Up to 12 blue colonies were picked per mating, generating a collection of 96-well plates of diploid clones as the final product of the screening process. The activation-domain fusion plasmids were submitted for PCR amplification and sequencing to identify the yeast ORF. A total of 8,676 blue colonies were picked from the screens: 6,909 (80%) passed PCR, sequencing, vector trimming and 6,215 (72%) passed interaction quality control. The resulting sequences were compared with the yeast sequence database using Blast2 (ref. 35). Sample handling and manipulation during the screens was tracked by computer and data analysis was carried out using GeneScape, web-based software developed at CuraGen.