Synopsis

Subject Categories: Computational methods | Chromatin & Transcription

Molecular Systems Biology 2 Article number: 2006.0012  doi:10.1038/msb4100054
Published online: 18 April 2006
Citation: Molecular Systems Biology 2:2006.0012



There is a News and Views associated with this article.

Deciphering principles of transcription regulation in eukaryotic genomes

Dat H Nguyen1 & Patrik D'haeseleer1,2

  1. Department of Genetics, Harvard Medical School, Boston, MA, USA
  2. Biosciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, USA

Correspondence to: Dat H Nguyen1 Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, NBR 238, Boston, MA 02115, USA. Tel.: +1 617 335 8439; Fax: +1 617 432 6513; E-mail: Email: dnguyen@genetics.med.harvard.edu

Received 8 April 2005; Accepted 8 February 2006; Published online 18 April 2006

Top

Article highlights

  • We present a deterministic mathematical framework for deriving principles of transcription regulation
  • We identify four classes of regulatory principles, all of which were validated by expression data; and they are short-range, mid-range, long-range, and orientation-dependent effects of motifs on gene expression levels.
  • We illustrate how evolution can use these principles as an additional dimension to amplify the combinatorial power of a small set of CREs in regulating transcription using the PAC and RRPE motifs as an example.

Top

Synopsis

Transcription regulation plays a critical role in the development, complexity, diversity, and homeostasis of all living organisms (Davidson, 2001; Levine and Tjian, 2003), as transcription is the first step in the universal pipeline of biological information flow from genome, where all genetic programs are stored, to proteome, through which these programs are executed. As a result, understanding the principles underlying the quantitative control of the transcriptional process constitutes a fundamental objective of quantitative biology, yet these remain poorly understood. Although transcription can be controlled at different levels (e.g., chromatin structure level), at the most fundamental level, first discovered by Jacob and Monod (1961), the production of transcripts of a given gene is determined by complex combinatorial interplay of cis-regulatory elements (CRE) (or motifs) present in the gene's promoter region and associated regulatory proteins (transcription factors (TFs)) present in the cell. Therefore, because TFs are gene products themselves, transcription of a gene is fundamentally regulated by the set of motifs present in its promoter. The principles that govern transcription regulation can thus be defined by a quantitative description of how motif strength—that is, the motif's influence on gene expression—depends on promoter context.

In spite of major efforts aimed at identifying motifs in different species using a variety of approaches and analyzing their precise influence on gene expression (McGuire and Church, 2000; McGuire et al, 2000; Stormo, 2000; Bussemaker et al, 2001; Pilpel et al, 2001; Sudarsanam et al, 2002; Guhathakurta et al, 2002a, 2002b; Beer and Tavazoie, 2004; Siggia, 2005; Tompa et al, 2005; Xie et al, 2005), little is known about the principles by which a gene's motifs translate into an expression level. In other words, quantitative effects of motifs on gene expression as a function of their promoter context remain poorly understood. Here we present a deterministic mathematical strategy, the motif expression decomposition (MED) method, that provides a framework for deriving principles of transcription regulation at the single gene level. The main feature of the MED method is that, unlike other methods used to measure the effect of motifs on gene expression (Pilpel et al, 2001; Beer and Tavazoie, 2004), MED provides a metric to infer from genome-wide expression data both the context-dependent influence of each motif on gene expression and the levels of activity of each TF under a set of environmental conditions. In addition, it operates on all genes in a genome without requiring any a priori knowledge of gene cluster/module membership or manual tuning of parameters.

Applying MED to yeast Saccharomyces cerevisiae transcriptional networks using a combined gene expression data set covering 255 conditions involving different environmental stresses (Gasch et al, 2000) and multiple stages of the cell cycle (Spellman et al, 1998), we found that motif strength can have a complex dependence on the motif's geometry—one of the attributes of promoter context—such as distance from the translation start site or motif orientation. We identify four classes of regulatory principles, all of which were validated by expression data (Figure 3). These are short-range, mid-range, long-range, and orientation-dependent effects of motifs with respect to gene expression levels. In addition, we illustrate how evolution can use these principles as an additional dimension to amplify the combinatorial power of a small set of CREs in regulating transcription (Figure 4) using the PAC and RRPE motifs as an example.

Figure 3
Figure 3 :  Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com

Four classes of transcriptional regulatory principles in S. cerevisiae. These graphs illustrate the dependency of motif strength on motif geometric constraints for the PAC (A, blue curve), RRPE (A, red curve), MCB (C), and the RAP1 (E) motifs. The position of the start codon is indicated by 'ATG'. To form instances of a gene ensemble containing each of these motifs (see Materials and methods section), the motif distance relative to ATG is binned with a bin size of 150 bp, except the last bin with a bin size of 250 bp. The average motif strength of each of these motif-containing gene ensemble instances (see Supplementary information 3 for the distribution of correlation coefficients) is plotted in the middle of each bin, with the error bar indicating the standard error of the average. Panels (B), (D), and (F) show the degree of gene coexpression, as measured by the average pairwise expression correlation, for the corresponding motif-containing gene ensemble instances for panels (A), (C), and (E). For MCB and RAP1 motifs, their orientational effects (5' represented in red, 3' represented in blue) on gene expression are also presented in addition to their geometric constraint.

Full figure and legend (204K)Figures & Tables index

Figure 4
Figure 4 :  Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com

The analysis of the gene ensemble that contains both the PAC and RRPE motifs. (A) Relative distance of PAC and RRPE to ATG is binned into three bins: [-150,ATG], [-300,-150], and [-1000,-300] bp, forming a total of nine PAC/RRPE-containing gene ensemble instances for nine combinations of promoter structures. Average motif strength is plotted against the position of PAC (along the x-axis) and RRPE (different curves). The black diamond curve represents motif strength averaged over all genes in three ensemble instances corresponding to three binned positions of RRPE (see Supplementary information 2 for motif strength averaged over all genes in three ensemble instances corresponding to three binned positions of PAC). In (B), predicted PAC and RRPE motif strengths are shown as a function of their relative order with respect to ATG (5'-RRPE-PAC-ATG and 5'-PAC-RRPE-ATG). For the 5'-RRPE-PAC-ATG ensemble instance, the magnitude of the PAC motif strength is about three times higher than the instance that contains these motifs in the reverse order, consistent with the corresponding degree of gene coexpression. Nevertheless, the motif strength of RRPE motif is insignificant regardless of motif order. In (C) and (D), the positions of PAC and RRPE motifs relative to ATG of gene promoters that contain them are presented. In (C), the location of each motif in RRPE-only (red), PAC-only (blue), and PAC/RRPE-only containing gene ensembles is represented by a filled circle of appropriate color located at a position it occurs in gene promoter relative to ATG. The choice of these 'only' ensembles is discussed in Supplementary information 2. Likewise, in (D), the positions of PAC and RRPE are plotted for PAC/RRPE gene ensemble with two different motif order arrangements. Data in (A) and (B) clearly indicate that there is no actual synergism between PAC and RRPE. This finding appears to contradict the appearance of PAC and RRPE synergic behaviors suggested in Supplementary information 2 and earlier work (Beer and Tavazoie, 2004). However, data shown in (C) and (D) and average pairwise expression correlation coefficient date in Table 1 not only confirm MED's prediction but also illustrate nature's use of geometry as another dimension for regulating transcription.


Full figure and legend (145K)Figures & Tables index

Top

Acknowledgements

We thank Nikos Reppas, Zhou Zhu, Xiaoxia Lin, Dana Pe'er, Saeed Tavazoie, Eric Siggia, and Joel Bader for critical reading of the manuscript. We thank John Aach for critical reading of the manuscript and useful suggestions on statistical tests. We are indebted to George M Church for his guidance and support of this work. Dat H Nguyen acknowledges support from the Alfred P Sloan and US Department of Energy Postdoctoral Fellowship in Computational Molecular Biology and Bioinformatics, and travel fellowships provided by the National Science Foundation Institute for Pure and Applied Mathematics at UCLA. George M Church was supported by US Department of Energy GTL Grant No. DE-FG02 02ER63461. PD was supported by PhRMA/Harvard CEIGI grant, and is currently supported by an LDRD grant at Lawrence Livermore National Laboratory.

Top

References

  1. BeerMA, TavazoieS (2004) Predicting gene expression from sequence. Cell117: 185–198 | Article | PubMed | ISI | ChemPort |
  2. BussemakerHJ, LiH, SiggiaED (2001) Regulatory element detection using correlation with expression. Nat Genet27: 167–171 | Article | PubMed | ISI | ChemPort |
  3. DavidsonEH (2001) Genomic Regulatory Systems: Development and Evolution. San Diego: Academic Press
  4. GaschAP, SpellmanPT, KaoCM, Carmel-HarelO, EisenMB, StorzG, BotsteinD, BrownPO (2000) Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell11: 4241–4257 | PubMed | ISI | ChemPort |
  5. GuhathakurtaD, PalomarL, StormoGD, TedescoP, JohnsonTE, WalkerDW, LithgowG, KimS, LinkCD (2002a) Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditiselegans using microarray gene expression and computational methods. Genome Res12: 701–712 | Article | ISI |
  6. GuhathakurtaD, SchrieferLA, HreskoMC, WaterstonRH, StormoGD (2002b) Identifying muscle regulatory elements and genes in the nematode Caenorhabditiselegans. Pac Symp Biocomput425–436
  7. JacobF, MonodJ (1961) Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol3: 318–356 | PubMed | ISI | ChemPort |
  8. LevineM, TjianR (2003) Transcription regulation and animal diversity. Nature424: 147–151 | Article | PubMed | ISI | ChemPort |
  9. McGuireAM, ChurchGM (2000) Predicting regulons and their cis-regulatory motifs by comparative genomics. Nucleic Acids Res15: 4523–4530 | Article |
  10. McGuireAM, HughesJD, ChurchGM (2000) Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res10: 744–757 | Article | PubMed | ISI | ChemPort |
  11. PilpelY, SudarsanamP, ChurchGM (2001) Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet29: 153–159 | Article | PubMed | ISI | ChemPort |
  12. SiggiaED (2005) Computational methods for transcriptional regulation. Curr Opin Genet Dev15: 214–221 | Article | PubMed | ISI | ChemPort |
  13. SpellmanPT, SherlockG, ZhangMQ, IyerVR, AndersK, EisenMB, BrownPO, BotsteinD, FutcherB (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomycescerevisiae by microarray hybridization. Mol Biol Cell9: 3273–3297 | PubMed | ISI | ChemPort |
  14. StormoGD (2000) DNA binding sites: representation and discovery. Bioinformatics16: 16–23 | Article | PubMed | ISI | ChemPort |
  15. SudarsanamP, PilpelY, ChurchGM (2002) Genome-wide co-occurrence of promoter elements reveals a cis-regulatory cassette of rRNA transcription motifs in Saccharomycescerevisiae. Genome Res12: 1723–1731 | Article | PubMed | ISI | ChemPort |
  16. TompaM, LiN, BaileyTL, ChurchGM, De MoorB, EskinE, FavorovAV, FrithMC, FuY, KentWJ, MakeevVJ, MironovAA, NobleWS, PavesiG, PesoleG, RegnierM, SimonisN, SinhaS, ThijsG, van HeldenJ, VandenbogaertM, WengZ, WorkmanC, YeC, ZhuZ (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol23: 137–144 | Article | PubMed | ISI | ChemPort |
  17. XieX, LuJ, KulbokasEJ, GolubTR, MoothaV, Lindblad-TohK, LanderES, KellisM (2005) Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature434: 338–345 | Article | PubMed | ISI | ChemPort |

MORE ARTICLES LIKE THIS

These links to content published by NPG are automatically generated.

NEWS AND VIEWS

Modeling gene expression control using Omes Law

Molecular Systems Biology News and Views (18 Apr 2006)

Promoting human promoters

Molecular Systems Biology News and Views (06 Jun 2006)

Extra navigation

.
ADVERTISEMENT