Prevalence of transcription promoters within archaeal operons and coding sequences
Tie Koide1,ab, David J Reiss1,a, J Christopher Bare1, Wyming Lee Pang1, Marc T Facciotti1,2, Amy K Schmid1, Min Pan1, Bruz Marzolf1, Phu T Van1, Fang-Yin Lo1, Abhishek Pratap1, Eric W Deutsch1, Amelia Peterson3, Dan Martin1,3 & Nitin S Baliga1,4
- Institute for Systems Biology, Seattle, WA, USA
- Department of Biomedical Engineering and UC Davis Genome Center, One Shields Avenue, University of California, Davis, CA, USA
- Divisions of Human Biology and Clinical Research, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Departments of Microbiology, and Molecular and Cellular Biology, University of Washington, Seattle, WA, USA
Correspondence to: Nitin S Baliga1,4 Institute for Systems Biology, Departments of Microbiology, and Molecular and Cellular Biology, University of Washington, 1441 N 34th Street, Seattle, WA 98103, USA. Tel.: +1 206 732 1266; Fax: +1 206 732 1299; Email: nbaliga@systemsbiology.org
Received 20 November 2008; Accepted 13 May 2009; Published online 16 June 2009
aThese authors contributed equally to this work
bPresent address: Departamento de Bioquímica e Imunologia, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Brazil. E-mail: Email: tiekoide@gmail.com
Top of pageArticle highlights
- A systematic evaluation of transcription factor binding site loci (TFBS) for nearly 10% of all TFs in Halobacterium salinarum NRC-1 via ChIP-chip demonstrated that a significant fraction of TFBS loci (as many as ~10% of multi-TFBS loci for 11 TFs) fell within coding regions.
- By correlating the dynamic changes in the transcriptome structure (TS) of H. salinarum NRC-1 during a complex cellular response with genome-wide binding locations of TFs and peptides from proteomics experiments, we have (i) characterized transcription start sites and termination sites for ~64% of all genes in this organism; and discovered (ii) new protein coding genes, (iii) 61 novel ncRNA candidates, (iv) 5' and 3' untranslated regions (UTRs) of mRNAs, (v) a large mRNA population with variable 3' end locations, and (vi) transcripts with extensive overlaps in their 3' termini.
- By integrating TFBS locations with the TS, we demonstrate that a significant number of TF binding events inside coding regions are indeed functional with important consequences such as in mediating conditional modulation of at least 43% of all investigated operons (p <10-9).
- These findings suggest that the construction of a mechanistically accurate model of a gene regulatory network would have to consider operons, promoters, and terminators as dynamically changing elements.
Synopsis
Evidence is mounting that the standard model of transcription factor (TF) binding to intergenic regions is not always the rule. Although there is isolated prior evidence for functional consequences of TF binding inside coding sequences, this issue had not been systematically evaluated genome wide. We have conducted a study to investigate the genome-wide consequence of internal TF binding for nearly 10% of all TFs in an archaeal extremophile, Halobacterium salinarum NRC-1. We show that a significant number of TF-binding sites (TFBS) inside the coding sequences are functional and have marked consequences, such as by conditionally modulating the architecture of at least 43% of all operons in this organism. We present the integrated analysis of complementary systems-wide data on TFBS locations and dynamic modulation of transcriptome structure that led to this striking discovery.
Using ChIP–chip and the MeDiChI algorithm (Reiss et al, 2008), we precisely located TFBSs and determined their corresponding local false discovery rates (LFDRs) from new and previously reported genome-wide ChIP–chip measurements for 11 TFs: all TFBs (TFBa, TFBb, TFBc, TFBd, TFBe, TFBf and TFBg), one TBP (TBPb) and three transcriptional regulators (TRs) (Trh3, Trh4, VNG1451C) in H. salinarum NRC-1. Our conclusion from this analysis was that as many as 10% of all multi-TFBS loci were within coding regions.
To show that these TFBS have significant functional consequences on transcriptional regulation and cellular physiology, we used high-density genome tiling arrays to analyze the transcriptome structure (TS) of H. salinarum NRC-1 at different phases of growth in a batch culture, which is associated with differential regulation of over 65% of all genes. Through this analysis we assigned transcription start sites (TSSs) to 64% of all annotated genes, termination sites (TTSs) to 46% of the genes, verified the expression of 203 operons and discovered 5'and 3' UTRs for
65% of all genes and operons. Further, by correlating the transcribed units with chromosomal coordinates of predicted genes (Ng et al, 2000) and experimentally mapped peptides from large-scale proteomics studies (Van et al, 2008), we revised the translation start site for 61 genes, detected 10 new protein-coding genes, and discovered 61 new putative ncRNAs. Although the physiological roles and mechanisms of action of specific ncRNAs remain to be uncovered, the bimodal distribution of correlations between the expression of ncRNAs and that of their antisense strands are consistent with the characterized roles of ncRNAs in the regulation of their cognate antisense transcripts. Finally, this analysis also showed a large mRNA population that has variable 3'-end locations and transcripts with extensive overlaps in their 3' termini.
By integrating TFBS locations with the TS, we identified internal binding sites that are functional in the conditional modulation of operon organization. We assessed the global prevalence of such operons by devising a quantitative measure for classifying operons as conditional. Specifically, we found that 43% of all operons are conditionally modulated by integrating probe intensities of transcripts hybridized to the genome tiling array with gene-expression correlations derived from expression analysis of H. salinarum NRC-1 in 719 microarray experiments. Remarkably, there was a strong functional link between transcription-factor binding inside operons and their classification as 'conditional' (P<10-9). We transcriptionally fused two of these conditionally activated promoters inside coding sequences to a reporter gene encoding a fast-degrading GFP variant optimized for the high-salt cytoplasm of halophilic archaea. FACS analysis of cells harboring these internal promoter–reporter transcriptional fusions provided in vivo validation of growth-phase regulated transcription initiation inside coding sequences.
Although earlier studies have discovered internal promoters within a single gene or operon (Tsui et al, 1994; Guillot and Moran, 2007), we have significantly extended these findings to a genome-wide scale to show that biologically meaningful promoters do exist inside coding sequences at a frequency that is much higher than was previously appreciated. Further, this discovery also shows how a simple prokaryote can use the same set of genes in different combinations to elicit complex responses according to an environmental challenge.
Irrespective of the specific underlying mechanisms, our observations of widespread modulation of operon architecture, as well as transcription initiation and termination inside genes, etc. all constitute evidence that archaea can intersperse regulatory logic within their coding sequence and thus blur the boundaries between coding and non-coding elements. We have shown that it is possible to use new high-throughput technologies to find these biologically important instances where transcriptional regulation does occur within coding sequences and, furthermore, that it is possible to globally characterize specific regulatory mechanisms responsible for these phenomena. Combined with new high-throughput sequencing technologies, our results will expand the view of genetic-information processing that can be investigated at high resolution (Nagalakshmi et al, 2008; Wilhelm et al, 2008). These data will enable construction of mechanistically accurate models for reliable systems re-engineering of biological circuits. Moreover, these findings suggest that the incorporation of mechanistic accuracy into GRN models would require operons, promoters, and terminators to be treated as dynamic entities.
Acknowledgements
Thanks to Kenia Whitehead and Sacha Coesel for helpful discussions, Dan Tenenbaum for the construction of the webpage and Kenichi Masumura for help in the growth-curve experiments. This work was supported by grants from NIH (P50GM076547 and 1R01GM077398-01A2), DoE (MAGGIE: DE-FG02-07ER64327 and DE-FG02-07ER64327), NSF (EF-0313754, EIA-0220153, MCB-0425825, DBI-0640950) and NASA (NNG05GN58G) to NSB.
References
- Guillot C, Moran Jr CP (2007) Essential internal promoter in the spoIIIA locus of Bacillus subtilis. J Bacteriol 189: 7181–7189 | Article | PubMed | ChemPort |
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320: 1344–1349 | Article | PubMed | ADS | ChemPort |
- Ng WV, Kennedy SP, Mahairas GG, Berquist B, Pan M, Shukla HD, Lasky SR, Baliga NS, Thorsson V, Sbrogna J, Swartzell S, Weir D, Hall J, Dahl TA, Welti R, Goo YA, Leithauser B, Keller K, Cruz R, Danson MJ et al (2000) Genome sequence of Halobacterium species NRC-1. Proc Natl Acad Sci USA 97: 12176–12181 | Article | PubMed | ADS | ChemPort |
- Reiss DJ, Facciotti MT, Baliga NS (2008) Model-based deconvolution of genome-wide DNA binding. Bioinformatics 24: 396–403 | Article | PubMed | ChemPort |
- Tsui HC, Zhao G, Feng G, Leung HC, Winkler ME (1994) The mutL repair gene of Escherichia coli K-12 forms a superoperon with a gene encoding a new cell-wall amidase. Mol Microbiol 11: 189–202 | Article | PubMed | ChemPort |
- Van PT, Schmid AK, King NL, Kaur A, Pan M, Whitehead K, Koide T, Facciotti MT, Goo YA, Deutsch EW, Reiss DJ, Mallick P, Baliga NS (2008) Halobacterium salinarum NRC-1 PeptideAtlas: toward strategies for targeted proteomics and improved proteome coverage. J Proteome Res 7: 3755–3764 | Article | PubMed | ChemPort |
- Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J (2008) Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453: 1239–1243 | Article | PubMed | ADS | ChemPort |


