Abstract
Interpretation of high-throughput biological data requires a knowledge of the design principles underlying the networks that sustain cellular functions. Of particular importance is the genetic network, a set of genes that interact through directed transcriptional regulation. Genes that exert a regulatory role encode dedicated transcription factors (hereafter referred to as regulating proteins) that can bind to specific DNA control regions of regulated genes to activate or inhibit their transcription. Regulated genes may themselves act in a regulatory manner, in which case they participate in a causal pathway. Looping pathways form feedback circuits. Because a gene can have several connections, circuits and pathways may crosslink and thus represent connected components. We have created a graph of 909 genetically or biochemically established interactions among 491 yeast genes. The number of regulating proteins per regulated gene has a narrow distribution with an exponential decay. The number of regulated genes per regulating protein has a broader distribution with a decay resembling a power law. Assuming in computer-generated graphs that gene connections fulfill these distributions but are otherwise random, the local clustering of connections and the number of short feedback circuits are largely underestimated. This deviation from randomness probably reflects functional constraints that include biosynthetic cost, response delay and differentiative and homeostatic regulation.
Similar content being viewed by others
Main
In integrating genome-wide data on transcript abundance1 into a dynamic view of gene networks, recent studies have focused on abstracting the principles that underlie the architecture and causal interplay of these networks. At present, the yeast Saccharomyces cerevisiae is the most suitable eukaryotic organism for achieving this goal, as much information about its transcriptional regulations has been accumulated2,3. Of roughly 6,000 yeast genes, 124 have been shown through genetic and biochemical experiments to encode regulating proteins that can influence the expression of specific genes2. These data were obtained from a previous review2 and were validated and updated, until July 2001, by manual inspection of the websites of MIPS, SwissProt, Yeast Protein Database, S. cerevisiae Promoter Database and the Saccharomyces Genome Database (see Web Note A online). The elements of the general transcription initiation machinery were excluded from this study, although some have differential roles in transcription of large subsets of genes3. Some of the 124 regulatory genes transcriptionally control a set of 367 non-regulatory genes (Fig. 1) through 837 connections (see Web Table A online). Of the 124 regulatory genes, 52 interact with themselves or with other regulatory genes through 72 additional links (see Web Table A online). A transcriptional regulatory network can thus be represented as a graph where vertices are genes and directed edges denote activating or repressing effects on transcription. The graph of these 52 'interregulatory' genes comprises mainly several small disconnected components (Fig. 1).
Most networks fall into two major categories on the basis of their connectivity distribution, pk, which represents the probability that a vertex in the network is connected to k other vertices. One category of networks is characterized by a pk that peaks at an average kmean and decays exponentially for large k4,5. In these exponential networks, most vertices have approximately the same number of links. By contrast, metabolic pathways6,7,8 belong to a category of nonhomogeneous networks, where pk decays as a power law. As the connections are inherently oriented in a transcriptional regulatory network, we separately analyzed the number of regulating proteins per regulated gene (arriving connectivity) and the number of regulated genes per regulating protein (departing connectivity), to determine whether they were best described by the exponential or power-law models.
The arriving connectivity of the yeast network has an exponential distribution, with 93% of the genes being regulated by 1–4 regulating proteins (Fig. 2a). The probabilitypk that a given target gene is regulated by k regulating proteins decreases roughly as Ce−βk (C is a constant), with β∼0.45 for both the total set of regulated genes and its interregulatory subset. The available data for Escherichia coli9 are compatible with an exponential distribution of arriving connections, with β∼1.2; this higher β coefficient means that fewer targets have many regulators. This coefficient thus reflects the molecular limits on the number of regulating proteins that can combinatorially exert an effect on the target gene expression. Consequently, lower coefficients are predicted for multicellular organisms with a more sophisticated genetic regulatory machinery.
The departing connectivity of the yeast network does not seem to be distributed according to an exponential law (Fig. 2b). It fits better a power law, although there are insufficient data to rule out other possibilities. The probability pk that a given regulating protein regulates k target genes decreases as approximately Ck−γ, with γ∼1 for both the global set of 124 regulatory proteins and its interregulatory subset. For E. coli as well, γ∼1 (our best fit computed from ref.9; see also refs 8,10). Because γ∼1, the number of departing connections (kpk∼kCk−1=C) is distributed almost equally over k, unlike the connections present in metabolic networks (γ∼1.5–3)6,7,8. Thus, bacterial and fungal genetic networks are free of a characteristic scale with respect to the distributions of both regulating proteins and departing connections.
The differing distribution laws for arriving and departing connectivities suggests that there is a correlation between them. A joint distribution (Fig. 2c) shows that genes with few regulators also tend to have few targets. Because there are many such genes, inactivating a gene selected at random has a low probability of altering the pathway structure of other genes. In contrast, inactivating one of the few highly connected genes would greatly decrease the communication between the remaining genes11 and could be lethal. Of 124 regulatory genes, 10 are essential, including 6 interregulatory genes that tend to be located upstream in the causal graph (Fig. 1). Indeed, their overall influence (direct and indirect targets) is twice as big on average as that of nonessential genes.
To evaluate the generality of the predicted topology, two things must be determined: (i) to what extent the present compilation differs from a complete yeast data set and (ii) whether the observed global topology is likely to hold true as more data accumulate. On the basis of sequence homology, at most, 77 additional yeast genes encode putative regulating proteins (see Web Note A online); however, recent work has investigated the genome-wide locations of 12 DNA-binding proteins, using chromatin immunoprecipitation and microarrays12,13,14,15. Depending on the laboratory, the number of targets thus obtained is on average 3.5-fold12,15 and 26-fold13,14 greater than the number found here for the same regulators (see Web Table A online). Although the exact number of targets depends on a somewhat arbitrary threshold, it is already clear that this new method has the potential to reveal many unsuspected links12,13,14,15. It is therefore essential to re-evaluate the topology of the yeast network once a sufficient set of regulatory genes has been studied with this genome-wide approach and universal threshold definitions. Moreover, theoretical considerations, consistent with the comparison of a subset to the whole set (Fig. 2a,b), suggest a way in which future data may affect the described network structure. If departing connectivity is free of a characteristic scale, future data should presumably not alter the power-law parameters. If arriving connectivity is shaped by the sophistication of the regulatory machinery, additional data would probably increase C while maintaining β.
To assess how accurately various models represent the biological situation, the actual yeast genetic network (a) was compared with directed random graphs modeled under three assumptions (see Web Note B online and Fig. 2): the connectivity distributions conform with (b) the empirical data (c) the laws deduced in Fig. 2 and (d) a Poisson law. A uniformly distributed connectivity (d) favors the emergence of a connected component that comprises the majority of the genes (Table 1), which is not observed. By contrast, both random graphs with constrained connectivity distributions (b or c; Fig. 2) reasonably approximate the average number of neighbors one or two steps away. At a more refined grain, however, they are no longer acceptable approximations. The local attribution of a few edges per vertex in a sparse graph is an important parameter that affects the network dynamics. It could be uniform, as in random graphs4, or highly clustered, as in small worlds5; extreme local clustering would result in global fragmentation, unlike small worlds, which still retain large connected components. Global fragmentation is observed (Fig. 1), beyond that expected from the empirical data or the deduced laws (b or c; Table 1). A clustering coefficient has been proposed to quantify the propensity of the links reaching an individual to involve him or her in local social interactions within 'cliques'5. Because genetic networks are directed, we introduce the notion of upstream or downstream 'semi-cliquishness' (see Web Note B online). The corresponding semi-clustering coefficients are approximately fivefold higher than those expected for the yeast network in a constrained random graph (Table 1). Along the same lines, the total number of observed feedback circuits is fivefold higher than that predicted by (b) or (c), and 14-fold higher for single-gene circuits(Table 1).
These circuits are crucial to the dynamics of the system. Positive circuits comprise an even number of inhibitory interactions and contribute to multistationarity, whereas negative circuits comprise an odd number of inhibitory interactions and contribute to homeostasis16. In this view, higher organisms are expected to rely more heavily than lower ones on positive circuits, particularly to achieve cellular differentiation, with each cell type corresponding to one of several stationary states. We observed five negative and six positive circuits in 52 yeast interregulatory genes (Fig. 1). As expected, this is in marked contrast to the genes of E. coli, where 45 circuits (39 negative, 3 positive, 3 dual) were observed for 55 interregulatory genes9. Yeast positive circuits control switching processes, such as those leading to pseudohyphal growth (YJL110C/YKR034W, controlled by YER040W)17, sporulation (YJR094C)18 or multiple-drug resistance (YBL005W)19. Negative circuits are constituted by (self-) inhibitors that finely control responses to the absence of glucose (YGL035C)20, DNA damage (YLR176C)21 or oxygen (YPR065W)22.
As a whole, the yeast transcriptional regulatory network combines a small maximal diameter, an elevated local semi-clustering, a high number of feedback circuits and a global fragmentation. This departure from a random distribution must reflect functional constraints. Indeed, each small connected piece implements a biological function, and the global fragmentation may serve to limit inter-functional crosstalk at the transcriptional level. The elevated clustering and feedback content probably implement differentiative and homeostatic requirements. Single-gene feedback circuits are predominant (this study and ref. 9) and may have been selected through evolution for several reasons: (i) they decrease the biosynthetic cost (roughly proportional to the amount of transcripts and proteins to be produced), (ii) together with the small diameter, they reduce the response delay (often a consequence of macromolecular synthesis) and (iii) they stabilize the fluctuations of expression of the involved genes23. Similar laws seem to govern the local and global network topologies in eukaryotes and prokaryotes, notwithstanding the circuit sign. When prior knowledge of the specific transcriptional connections is limited, these laws may prove general enough to facilitate the integration of transcriptomic data into dynamic models of genetic networks.
Note: Supplementary information is available on the Nature Genetics website.
References
DeRisi, J.L., Iyer, V.R & Brown, P.O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997).
Svetlov, V.V. & Cooper, T.G. Review: compilation and characteristics of dedicated transcription factors in Saccharomyces cerevisiae. Yeast 11, 1439–1484 (1995).
Holstege, F.C.P. et al. Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95, 717–728 (1998).
Erdös, P. & Rényi, A. On random graphs. Publicationes Mathematicae 6, 290–297 (1959).
Watts, D.J. & Strogatz, S.H. Collective dynamics of 'small-world' networks. Nature 393, 440–442 (1998).
Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N. & Barabasi, A.-L. The large-scale organization of metabolic networks. Nature 407, 651–654 (2000).
Fell, D.A. & Wagner, A. The small world of metabolism. Nature Biotech. 18, 1121–1122 (2000).
Raine, D.J. & Norris, V. Network structure of metabolic pathways. Interjournal Complex System ♯361 (International Conference on Complex Systems, Nashua, New Hampshire, 21–26 May 2000).
Thieffry, D., Huerta, A.M., Pérez-Rueda, A. & Collado-Vides, J. From specific gene regulation to genomic networks: a global analysis to transcriptional regulation in Escherichia coli. Bioessays 20, 433–440 (1998).
Karp, P.D. Pathway databases: a case study in computational symbolic theories. Science 293, 2040–2044 (2001).
Albert, R., Jeong, H. & Barabasi, A.-L. Error and attack tolerance of complex networks. Nature 406, 378–382 (2000).
Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).
Iyer, V.R. et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538 (2001).
Lieb, J.D., Liu, X., Botstein, D. & Brown, P.O. Promoter-specific binding of Rap1 revealed by genome-wide maps of protein-DNA association. Nature Genet. 28, 327–334 (2001).
Simon, I. et al. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106, 697–708 (2001).
Thomas, R. & D'Ari, R. Biological Feedback (CRC, Boca Raton, 1990).
Lorenz, M.C. & Heitman, J. The MEP2 ammonium permease regulates pseudohyphal differentiation in Saccharomyces cerevisiae. EMBO J. 17, 1236–1247 (1998).
Vershon, A.K. & Pierce, M. Transcriptional regulation of meiosis in yeast. Curr. Opin. Cell Biol. 12, 334–339 (2000).
Rogers, B. et al. The pleiotropic drug ABC transporters from Saccharomyces cerevisiae. Mol. Microbiol. Biotechnol. 3, 207–214 (2001).
Carlson, M. Glucose repression in yeast. Curr. Opin. Microbiol. 2, 202–207 (1999).
Huang, M., Zhou, Z. & Elledge, S.J. The DNA replication and damage checkpoint pathways induce transcription by inhibition of the Crt1 repressor. Cell 94, 595–605 (1998).
Zhang, L. & Hach, A. Molecular mechanisms of heme signaling in yeast: the transcriptional activator Hap1 serves as the key mediator. Cell. Mol. Life. Sci. 56, 415–426 (1999).
Becsksei, A. & Serrano, L. Engineering stability in gene networks by autoregulation. Nature 405, 590–593 (2000).
Acknowledgements
We thank M.-H. Mucchielli for help with the statistical analysis and M. Gromov, V. Norris and B. Prum for critically reading the manuscript. This work was supported by funding from CNRS and Conseil Régional d´Ile de France.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Rights and permissions
About this article
Cite this article
Guelzim, N., Bottani, S., Bourgine, P. et al. Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet 31, 60–63 (2002). https://doi.org/10.1038/ng873
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng873
This article is cited by
-
Additional insights into the organization of transcriptional regulatory modules based on a 3D model of the Saccharomyces cerevisiae genome
BMC Research Notes (2022)
-
Advanced genomics identifies growth effectors for proteotoxic ER stress recovery in Arabidopsis thaliana
Communications Biology (2022)
-
Internetwork connectivity of molecular networks across species of life
Scientific Reports (2021)
-
Three topological features of regulatory networks control life-essential and specialized subsystems
Scientific Reports (2021)
-
Scarcity of scale-free topology is universal across biochemical networks
Scientific Reports (2021)