Sequencing and comparison of yeast species to identify genes and regulatory elements


Identifying the functional elements encoded in a genome is one of the principal challenges in modern biology. Comparative genomics should offer a powerful, general approach. Here, we present a comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species (S. paradoxus, S. mikatae and S. bayanus). We first aligned the genomes and characterized their evolution, defining the regions and mechanisms of change. We then developed methods for direct identification of genes and regulatory motifs. The gene analysis yielded a major revision to the yeast gene catalogue, affecting approximately 15% of all genes and reducing the total count by about 500 genes. The motif analysis automatically identified 72 genome-wide elements, including most known regulatory motifs and numerous new motifs. We inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions. The results have implications for genome analysis of diverse organisms, including the human.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Aligned ORFs across four species. A 50-kb segment of S. cerevisiae chromosome VII aligned with orthologous contigs from each of the other three species.
Figure 2: Genome evolution.
Figure 3: Evolutionary tree of the four yeast species.
Figure 4: Spurious ORF rejected by RFC test.
Figure 5: Examples of proposed changes in gene structure.
Figure 6: Conservation in the GAL1GAL10 intergenic region.
Figure 7: Distribution of motifs by conservation score.


  1. 1

    Goffeau, A. et al. Life with 6000 genes. Science 274, 546, 563–567 (1996)

  2. 2

    Kowalczuk, M., Mackiewicz, P., Gierlik, A., Dudek, M. R. & Cebrat, S. Total number of coding open reading frames in the yeast genome. Yeast 15, 1031–1034 (1999)

  3. 3

    Harrison, P. M., Kumar, A., Lang, N., Snyder, M. & Gerstein, M. A question of size: the eukaryotic proteome and the problems in defining it. Nucleic Acids Res. 30, 1083–1090 (2002)

  4. 4

    Velculescu, V. E. et al. Characterization of the yeast transcriptome. Cell 88, 243–251 (1997)

  5. 5

    Blandin, G. et al. Genomic exploration of the hemiascomycetous yeasts: 4. The genome of Saccharomyces cerevisiae revisited. FEBS Lett. 487, 31–36 (2000)

  6. 6

    Wood, V., Rutherford, K. M., Ivens, A., Rajandream, M.-A. & Barrell, B. A Re-annotation of the Saccaromyces cerevisiae genome. Comp. Funct. Genomics 2, 143–154 (2001)

  7. 7

    International Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)

  8. 8

    Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994)

  9. 9

    Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. & Church, G. M. Systematic determination of genetic network architecture. Nature Genet. 22, 281–285 (1999)

  10. 10

    Stormo, G. D. DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)

  11. 11

    McGuire, A. M., Hughes, J. D. & Church, G. M. Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res. 10, 744–757 (2000)

  12. 12

    Loots, G. G. et al. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288, 136–140 (2000)

  13. 13

    Pennacchio, L. A. & Rubin, E. M. Genomic strategies to identify mammalian regulatory sequences. Nature Rev. Genet. 2, 100–109 (2001)

  14. 14

    Oeltjen, J. C. et al. Large-scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. Genome Res. 7, 315–329 (1997)

  15. 15

    Cliften, P. F. et al. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 11, 1175–1186 (2001)

  16. 16

    Alm, R. A. et al. Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature 397, 176–180 (1999)

  17. 17

    Carlton, J. M. et al. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature 419, 512–519 (2002)

  18. 18

    Perrin, A. et al. Comparative genomics identifies the genetic islands that distinguish Neisseria meningitidis, the agent of cerebrospinal meningitis, from other Neisseria species. Infect. Immun. 70, 7063–7072 (2002)

  19. 19

    McClelland, M. et al. Comparison of the Escherichia coli K-12 genome with sampled genomes of a Klebsiella pneumoniae and three salmonella enterica serovars, Typhimurium, Typhi and Paratyphi. Nucleic Acids Res. 28, 4974–4986 (2000)

  20. 20

    Batzoglou, S. et al. ARACHNE: a whole-genome shotgun assembler. Genome Res. 12, 177–189 (2002)

  21. 21

    Gardner, M. J. et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498–511 (2002)

  22. 22

    Fischer, G., James, S. A., Roberts, I. N., Oliver, S. G. & Louis, E. J. Chromosomal evolution in Saccharomyces. Nature 405, 451–454 (2000)

  23. 23

    Dunham, M. J. et al. Characteristic genome rearrangements in experimental evolution of Saccharomyces cerevisiae. Proc. Natl Acad. Sci. USA 99, 16144–16149 (2002)

  24. 24

    Blanchette, M. & Tompa, M. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12, 739–748 (2002)

  25. 25

    Fischer, G., Neuveglise, C., Durrens, P., Gaillardin, C. & Dujon, B. Evolution of gene order in the genomes of two related yeast species. Genome Res. 11, 2009–2019 (2001)

  26. 26

    Wolfe, K. H. & Shields, D. C. Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387, 708–713 (1997)

  27. 27

    Bon, E. et al. Genomic exploration of the hemiascomycetous yeasts: 5. Saccharomyces bayanus var. uvarum. FEBS Lett. 487, 37–41 (2000)

  28. 28

    International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)

  29. 29

    Dujon, B. et al. Complete DNA sequence of yeast chromosome XI. Nature 369, 371–378 (1994)

  30. 30

    Sharp, P. M. & Li, W. H. The codon Adaptation Index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295 (1987)

  31. 31

    Clark, T. A., Sugnet, C. W. & Ares, M. Jr Genome-wide analysis of mRNA processing in yeast using splicing-specific microarrays. Science 296, 907–910 (2002)

  32. 32

    Hurst, L. D. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet. 18, 486 (2002)

  33. 33

    Chu, S. et al. The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998)

  34. 34

    True, H. L. & Lindquist, S. L. A yeast prion provides a mechanism for genetic variation and phenotypic diversity. Nature 407, 477–483 (2000)

  35. 35

    Koufopanou, V., Goddard, M. R. & Burt, A. Adaptation for horizontal transfer in a homing endonuclease. Mol. Biol. Evol. 19, 239–246 (2002)

  36. 36

    Haber, J. E. Mating-type gene switching in Saccharomyces cerevisiae. Annu. Rev. Genet. 32, 561–599 (1998)

  37. 37

    Hampson, S., Kibler, D. & Baldi, P. Distribution patterns of over-represented k-mers in non-coding yeast DNA. Bioinformatics 18, 513–528 (2002)

  38. 38

    McCue, L. et al. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 29, 774–782 (2001)

  39. 39

    Gelfand, M. S., Koonin, E. V. & Mironov, A. A. Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. Nucleic Acids Res. 28, 695–705 (2000)

  40. 40

    Keegan, L., Gill, G. & Ptashne, M. Separation of DNA binding from the transcription-activating function of a eukaryotic regulatory protein. Science 231, 699–704 (1986)

  41. 41

    Zhu, J. & Zhang, M. Q. SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 15, 607–611 (1999)

  42. 42

    Mewes, H. W. et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 27, 44–48 (1999)

  43. 43

    Dwight, S. S. et al. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res. 30, 69–72 (2002)

  44. 44

    Lee, T. I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002)

  45. 45

    Gasch, A. P. & Eisen, M. B. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol. 3 RESEARCH0059 (2002)

  46. 46

    Mosley, A. L., Lakshmanan, J., Aryal, B. K. & Ozcan, S. Glucose-mediated phosphorylation converts the transcription factor Rgt1 from a repressor to an activator. J. Biol. Chem. 278, 10322–10327 (2003)

  47. 47

    Lindgren, A. et al. The pachytene checkpoint in Saccharomyces cerevisiae requires the Sum1 transcriptional repressor. EMBO J. 19, 6489–6497 (2000)

  48. 48

    Jacobs Anderson, J. S. & Parker, R. Computational identification of cis-acting elements affecting post-transcriptional control of gene expression in Saccharomyces cerevisiae. Nucleic Acids Res. 28, 1604–1617 (2000)

  49. 49

    Zeitlinger, J. et al. Program-specific distribution of a transcription factor dependent on partner transcription factor and MAPK signaling. Cell 113, 395–404 (2003)

  50. 50

    Morillon, A., Springer, M. & Lesage, P. Activation of the Kss1 invasive-filamentous growth pathway induces Ty1 transcription and retrotransposition in Saccharomyces cerevisiae. Mol. Cell Biol. 20, 5766–5776 (2000)

Download references


We thank D. Botstein, M. Cherry, K. Dolinski, D. Fisk, S. Weng and other members of the Saccharomyces Genome Database staff for assistance with SGD, for making our data available to the community through SGD, and for discussions; J. Butler, S. Calvo, J. Galagan, D. Jaffe, J. Lehar and L. Jun Ma for technical advice and discussions; the staff of the Whitehead/MIT Center for Genome Research Sequencing Center who generated the shotgun sequence from the three yeast species; T. Lee, N. Rinaldi, R. Young and J. Zeitlinger for sharing data about chromatin immunoprecipitation experiments and for discussions; M. Eisen and A. Gasch for sharing information about gene expression clusters and for discussions; E. Louis and I. Roberts for providing yeast strains and discussions; B. Berger, G. Fink, D. Gifford, S. Lindquist and H. True-Krobb for discussions; and L. Gaffney for assistance with figures.

Author information

Correspondence to Manolis Kellis or Eric S. Lander.

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Supplementary information

Supplementary Figure 1: nucleotide alignment for Figure 4 (PDF 124 kb)

Supplementary Figure 2: nucleotide alignment for Figures 5a, 5b, 5c (PDF 890 kb)

Supplementary Figure 3: nucleotide alignment for Figures 5d, 5e (PDF 1548 kb)

Supplementary methods and index to author’s website (DOC 71 kb)

Rights and permissions

Reprints and Permissions

About this article

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.