Synopsis

Subject Categories: Bioinformatics | RNA

Molecular Systems Biology 5 Article number: 268  doi:10.1038/msb.2009.24
Published online: 28 April 2009
Citation: Molecular Systems Biology 5:268

Discovering structural cis-regulatory elements by modeling the behaviors of mRNAs

Barrett C Foat1 & Gary D Stormo1

  1. Department of Genetics, Center for Genome Sciences, Washington University School of Medicine, St Louis, MO, USA

Correspondence to: Gary D Stormo1 Department of Genetics, Washington University School of Medicine, 4444 Forest Park Ave., Campus Box 8510, St Louis, MO 63108, USA. Tel.: +314 747 5534; Fax: +314 362 2156; Email: stormo@genetics.wustl.edu

Received 27 October 2008; Accepted 17 March 2009; Published online 28 April 2009

Top

Article highlights

  • We present a novel, alignment-free method (StructRED) that discovers secondary structure-defined cis-regulatory elements in mRNAs by modeling the effects that their occurrences exert on quantitative measurements of mRNA behavior in the form of microarray data.
  • We accurately recover the known stem-loop binding specificities of the Drosophila RNA-binding protein Smaug and the S. cerevisiae protein Vts1p from mRNA sequences and microarray data.
  • We report other putative structure-sequence specificities for RNA-binding proteins that likely play diverse roles in Drosophila and humans.
  • We find that our discovered secondary structure-defined cis-regulatory elements exist in coding sequences in addition to untranslated regions.

Top

Synopsis

Gene expression is regulated at each step from chromatin remodeling through translation and degradation. Yet, most efforts to understand the regulation of gene expression have been focused on transcription and DNA-binding regulatory proteins. Although regulatory RNAs have received appreciable attention (Bushati and Cohen, 2007; Coppins et al, 2007), regulatory elements within mRNAs that are recognized by nucleic acid-binding proteins have been largely ignored until recently (Keene, 2007). This state exists despite observations that suggest changes in mRNA stability may account for half of the changes in mRNA expression in some cells and conditions (Fan et al, 2002; Cheadle et al, 2005). Moreover, it is a mathematical certainty that mRNAs of average stability can only be rapidly downregulated by altering the mRNA decay rate (see Pérez-Ortín et al, 2007 for derivation). Thus, one way to execute rapid, large-scale gene expression responses to unpredictable environmental stimuli is through decay-regulating RNA-binding proteins (RBPs), whose activity can be rapidly modulated post-transcriptionally. Early metazoan embryogenesis also requires mRNA stability and translation regulation to orchestrate the activities of maternally deposited transcripts (for review see Vardy and Orr-Weaver, 2007).

Despite the potential importance of RNA secondary structures as binding sites for regulatory RBPs, computational methods for their discovery have failed to keep pace with current functional genomics technology (e.g. microarrays). Now, well into the era of functional genomics, RNA structure finding algorithms are still sequence-only methods, having so far failed to use the data-integrative approaches that are becoming increasingly common for the discovery of DNA-binding protein specificities (Bussemaker et al, 2001, 2007; Foat et al, 2005, 2006).

In this work, we present a novel, alignment-free method that discovers secondary structure-defined cis-regulatory elements (SCREs) in mRNAs by modeling the effects that their occurrences exert on quantitative measurements of mRNA behavior in the form of microarray data. This process is embodied in a regression-based algorithm called structural cis-regulatory element detector (StructRED). The method defines a RNA structure search space, which is small stem–loop structures in this version, and then exhaustively scores all short nucleotide sequences within the structural context for how well their occurrences explain observed microarray measurements. Through an iterative process, a multivariate model consisting of SCRE-derived mRNA sequence scores is developed to explain the input microarray data. The output of the method is a list of putative SCRE weight matrices and the inferred post-translational regulatory activities of the unknown trans-factors across all of the input microarray conditions (trans-factor activity profiles, TFAPs).

We accurately recover the known stem–loop binding specificities of the RNA-binding proteins Smaug in Drosophila and Vts1p in S. cerevisiae using mRNA sequences and microarray data. When we inspected the computationally inferred behavior of the Smaug protein across several microarray experiments profiling mRNA levels and translation in developing Drosophila embryos (Pilot et al, 2006; Qin et al, 2007; Tadros et al, 2007; GEO accessions GSE8910, GSE3955, GSE5430), Smaug represses translation of its target mRNAs during the first 2 h of embryogenesis and promotes the degradation of its target transcripts starting at about 2 h of development. Our genome-wide inferences are consistent with the detailed observations of Smaug destabilizing Hsp83 mRNAs (Semotok et al, 2005) and translationally repressing nanos mRNAs (Dahanukar et al, 1999; Smibert et al, 1999)

In addition to the Smaug SCREs, we discovered six other putative SCREs in Drosophila, which we have labeled Dm1 through Dm6, that have coherent supporting TFAPs and annotation (Figure 4). First, Dm1 and Dm2 were discovered from an mRNA expression microarray time course for Drosophila embryogenesis (Tadros et al, 2007). Those transcripts that contain high-affinity instances of Dm1 and Dm2 are expressed at decreasing levels as development proceeds, suggesting that they are involved in destabilizing these transcripts at specific developmental stages. The Dm3 and Dm4 specificities were detected using microarray data that compared expression in wild-type flies and flies lacking the Kep1 RNA-binding protein (GEO accession GSE6086), suggesting that they may represent the specificity of Kep1. Finally, Dm5 and Dm6 both were detected using polysome association data from the early Drosophila embryo (Qin et al, 2007; GEO accession GSE5430), suggesting that they are involved in regulating translation during embryogenesis.

Figure 4
Figure 4 :  Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com

Putative Drosophila structural cis-regulatory elements. (A) The structural logos of the six putative Drosophila SCREs. (B) Dm1, Dm2, Dm3, and Dm4 were detected using mRNA expression microarray data. Dm1 and Dm2 had strong negative correlations with mRNA levels over early Drosophila development. Dm1 and Dm2 did not correlate with mRNA levels in similarly treated Deltasmg eggs (not shown). Dm3 and Dm4 correlated with mRNA levels changing between wild-type and Deltakep1 flies (GEO accession GSE6086), suggesting that Dm3 and Dm4 may reflect the specificity of Kep1, an RNA-binding protein. (C) Dm5 and Dm6 were detected from microarray data measuring mRNA association with ribosomes in early drosophila development (Qin et al, 2007). Triangles represent increasing density of sucrose gradient fractions, corresponding to increasing numbers of ribosomes.

Full figure and legend (284K)Figures & Tables index

To answer the question of where the discovered Drosophila SCREs commonly occur in the mRNAs, we scored the occurrences of each SCRE in the 5' UTR, 3' UTR, and coding sequence separately and then checked which of these mRNA subsequences performed best at explaining the microarray data. For Dm2, Dm3, Dm4, Dm6, and the Smaug SCREs, the explanatory SCRE occurrences appear primarily in the coding sequences (Figure 5). Dm1, Dm5, and Dm6 still have appreciable signal in the 3' UTRs, and Dm5 has signal in the 5' UTR. SCREs frequently appearing in coding sequences provides a strong argument for including whole transcripts when searching for cis-regulatory elements.

Figure 5
Figure 5 :  Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com

Explanatory structural cis-regulatory element content of mRNA regions. These trans-factor activity profiles (TFAPs) are for all of the Drosophila SCREs over all of the same conditions shown in Figures 3 and 4. However, these TFAPs display how well each SCRE explained the measured RNA levels when occurrences of the SCREs are only scored in the 5' untranslated regions (UTRs), 3' UTRs, coding sequences (CDS), or full-length mRNAs. Thus, by comparing each subsequence TFAP to the full-length mRNA TFAP, one can see in which region of mRNAs functional instances of the SCRE tend to exist. Most of the SCREs have their strongest signal in the CDSs, followed by the 3' UTRs.


Full figure and legend (512K)Figures & Tables index

We applied StructRED to human microarray data that measured genome-wide RBP binding or profiled genome-wide polysome association. We discovered three SCREs with functionally coherent TFAPs. Occurrences of the Hs1 SCRE correlated with decreased translation in the metastatic colorectal cancer cell line, SW620, versus a non-metastatic cell line from the same patient, SW480, as measured in a polysome association microarray study (Provenzani et al, 2006; GEO accession GSE2509). Transcripts containing Hs2 SCREs are expressed at a lower level in U937 cells that have been exposed to 12-myristate 13-acetate (PMA) and caused to differentiate into a macrophage-like state (Kitamura et al, 2004; GEO accession GSE1783). Finally, occurrences of Hs3 in mRNAs correlate with increased association with ribosomes in human mammary epithelial cells, regardless of whether translation initiation factor 4F is overexpressed (Larsson et al, 2007; GEO accession GSE6043).

The StructRED algorithm represents a novel method for determining cis-regulatory RNA structures. Although the current implementation is limited to finding short stem–loop motifs, it may be extensible to other small structures (dsRNA, internal bulges) and perhaps more complex structures. Given its strengths, we expect that StructRED may become the basis of a class of RNA regulatory element search tools that will expand computational and experimental inquiries into post-transcriptional gene regulation. Overall, we show that structurally defined cis-regulatory elements can be discovered through integrative modeling of functional genomics and mRNA sequence data.

Top

Acknowledgements

We thank Craig Smibert, Barak Cohen, Jim Skeath, Yue Zhao, and Ryan Christensen for critical readings of the paper. BCF is a PhRMA Foundation Informatics Fellow and a Washington University School of Medicine, Department of Genetics Fellow. This work was supported by National Institutes of Health grant HG00249 to GDS.

Top

References

  1. Bushati N, Cohen SM (2007) microRNA functions. Annu Rev Cell Dev Biol 23: 175–205 | Article | PubMed | ChemPort |
  2. Bussemaker HJ, Foat BC, Ward LD (2007) Predictive modeling of genome-wide mRNA expression: from modules to molecules. Annu Rev Biophys Biomol Struct 36: 329–347 | Article | PubMed | ChemPort |
  3. Bussemaker HJ, Li H, Siggia ED (2001) Regulatory element detection using correlation with expression. Nat Genet 27: 167–171 | Article | PubMed | ISI | ChemPort |
  4. Cheadle C, Fan J, Cho-Chung YS, Werner T, Ray J, Do L, Gorospe M, Becker KG (2005) Control of gene expression during T cell activation: alternate regulation of mRNA transcription and mRNA stability. BMC Genomics 6: 75 | Article | PubMed | ChemPort |
  5. Coppins RL, Hall KB, Groisman EA (2007) The intricate world of riboswitches. Curr Opin Microbiol 10: 176–181 | Article | PubMed | ISI | ChemPort |
  6. Dahanukar A, Walker JA, Wharton RP (1999) Smaug, a novel RNA-binding protein that operates a translational switch in Drosophila. Mol Cell 4: 209–218 | Article | PubMed | ISI | ChemPort |
  7. Fan J, Yang X, Wang W, Wood WH, Becker KG, Gorospe M (2002) Global analysis of stress-regulated mRNA turnover by using cDNA arrays. Proc Natl Acad Sci USA 99: 10611–10616 | Article | PubMed | ChemPort |
  8. Foat BC, Houshmandi SS, Olivas WM, Bussemaker HJ (2005) Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. Proc Natl Acad Sci USA 102: 17675–17680 | Article | PubMed | ADS | ChemPort |
  9. Foat BC, Morozov AV, Bussemaker HJ (2006) Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics 22: e141–e149 | Article | PubMed | ChemPort |
  10. Keene JD (2007) RNA regulons: coordination of post-transcriptional events. Nat Rev Genet 8: 533–543 | Article | PubMed | ChemPort |
  11. Kitamura H, Nakagawa T, Takayama M, Kimura Y, Hijikata A, Hijika A, Ohara O (2004) Post-transcriptional effects of phorbol 12-myristate 13-acetate on transcriptome of U937 cells. FEBS Lett 578: 180–184 | Article | PubMed | ChemPort |
  12. Larsson O, Li S, Issaenko OA, Avdulov S, Peterson M, Smith K, Bitterman PB, Polunovsky VA (2007) Eukaryotic translation initiation factor 4E induced progression of primary human mammary epithelial cells along the cancer pathway is associated with targeted translational deregulation of oncogenic drivers and inhibitors. Cancer Res 67: 6814–6824 | Article | PubMed | ADS | ChemPort |
  13. Pérez-Ortín JE, Alepuz PM, Moreno J (2007) Genomics and gene transcription kinetics in yeast. Trends Genet 23: 250–257 | Article | PubMed | ChemPort |
  14. Provenzani A, Fronza R, Loreni F, Pascale A, Amadio M, Quattrone A (2006) Global alterations in mRNA polysomal recruitment in a cell model of colorectal cancer progression to metastasis. Carcinogenesis 27: 1323–1333 | Article | PubMed | ISI | ChemPort |
  15. Qin X, Ahn S, Speed TP, Rubin GM (2007) Global analyses of mRNA translational control during early Drosophila embryogenesis. Genome Biol 8: R63 | Article | PubMed | ChemPort |
  16. Semotok JL, Cooperstock RL, Pinder BD, Vari HK, Lipshitz HD, Smibert CA (2005) Smaug recruits the CCR4/POP2/NOT deadenylase complex to trigger maternal transcript localization in the early Drosophila embryo. Curr Biol 15: 284–294 | Article | PubMed | ISI | ChemPort |
  17. Smibert CA, Lie YS, Shillinglaw W, Henzel WJ, Macdonald PM (1999) Smaug, a novel and conserved protein, contributes to repression of nanos mRNA translation in vitro. RNA 5: 1535–1547 | Article | PubMed | ISI | ChemPort |
  18. Tadros W, Goldman AL, Babak T, Menzies F, Vardy L, Orr-Weaver T, Hughes TR, Westwood JT, Smibert CA, Lipshitz HD (2007) SMAUG is a major regulator of maternal mRNA destabilization in Drosophila and its translation is activated by the PAN GU kinase. Dev Cell 12: 143–155 | Article | PubMed | ChemPort |
  19. Vardy L, Orr-Weaver TL (2007) Regulating translation of maternal messages: multiple repression mechanisms. Trends Cell Biol 17: 547–554 | Article | PubMed | ChemPort |

MORE ARTICLES LIKE THIS

These links to content published by NPG are automatically generated.

NEWS AND VIEWS

SAM breaks its stereotype

Nature Structural Biology News and Views (01 Sep 2003)

Extra navigation

.
ADVERTISEMENT