Comparative studies have been very effective at identifying conserved cis sequences that might have regulatory functions; the snag, however, is that only some of those sequences will actually be bound by a regulator. Christopher Harbison, Benjamin Gordon and colleagues have now brought some much-needed clarity to this area of eukaryotic transcription: by merging data from various sources — including phylogenetic information and protein–DNA binding data — they have generated a detailed map of how yeast transcription factors interact to transcribe the genome. It's all in there: which predicted promoter elements are functional, which regulators associate with them and how, and in what way the binding associations depend on the environment.

With several genome-wide regulatory studies to its name, the yeast Saccharomyces cerevisiae is an excellent starting point for defining eukaryotic transcription. Starting with 203 DNA-binding regulatory proteins — probably all such proteins in the genome — the authors' first task was to find which sequences they bind to. This was done by examining the genome-wide location of DNA-bound proteins at a high level of stringency. Next, they computationally defined specific motifs that were bound at high levels of confidence by 102 of these yeast regulators by combining the regulator–DNA binding data with relevant published information and sequence comparisons among Saccharomyces species, and by validating previously identified regulator–DNA relationships. The information that emerges from the resulting map, which consists of 3,353 interactions and 1,296 promoter regions, is doubly useful as it incorporates genome-wide binding interactions that were carried out in different environments, such as varying cell-growth conditions.

The stringent approach with which the map was devised makes it a unique resource, but just as useful is the information that the authors were able to extract from it. For example, they found that regulator binding sites are not distributed at random, but are mostly clustered between 100 and 500 base pairs upstream of the coding region. They also defined four types of promoter based on how binding sites were organized, which in turn hints at how promoter architecture influences downstream gene transcription — for example, through combinatorial protein interactions. The impact of including the effect on the environment was felt most obviously here, as promoters could be classified according to how growth-factor status or concentration, say, affected the number and type of promoter elements that were occupied.

With this framework in place, we can begin to model the mechanisms that underlie global gene transcription, and eventually to extend the same approach to multicellular eukaryotes.