Systematic discovery of structural elements governing stability of mammalian messenger RNAs

Journal name:
Nature
Volume:
485,
Pages:
264–268
Date published:
DOI:
doi:10.1038/nature11013
Received
Accepted
Published online

Decoding post-transcriptional regulatory programs in RNA is a critical step towards the larger goal of developing predictive dynamical models of cellular behaviour. Despite recent efforts1, 2, 3, the vast landscape of RNA regulatory elements remains largely uncharacterized. A long-standing obstacle is the contribution of local RNA secondary structure to the definition of interaction partners in a variety of regulatory contexts, including—but not limited to—transcript stability3, alternative splicing4 and localization3. There are many documented instances where the presence of a structural regulatory element dictates alternative splicing patterns (for example, human cardiac troponin T) or affects other aspects of RNA biology5. Thus, a full characterization of post-transcriptional regulatory programs requires capturing information provided by both local secondary structures and the underlying sequence3, 6. Here we present a computational framework based on context-free grammars3, 7 and mutual information2 that systematically explores the immense space of small structural elements and reveals motifs that are significantly informative of genome-wide measurements of RNA behaviour. By applying this framework to genome-wide human mRNA stability data, we reveal eight highly significant elements with substantial structural information, for the strongest of which we show a major role in global mRNA regulation. Through biochemistry, mass spectrometry and in vivo binding studies, we identified human HNRPA2B1 (heterogeneous nuclear ribonucleoprotein A2/B1, also known as HNRNPA2B1) as the key regulator that binds this element and stabilizes a large number of its target genes. We created a global post-transcriptional regulatory map based on the identity of the discovered linear and structural cis-regulatory elements, their regulatory interactions and their target pathways. This approach could also be used to reveal the structural elements that modulate other aspects of RNA behaviour.

At a glance

Figures

  1. Discovery of RNA structural motifs informative of genome-wide transcript stability.
    Figure 1: Discovery of RNA structural motifs informative of genome-wide transcript stability.

    Each RNA structural motif is shown (far right) along with its pattern of enrichment/depletion across the range of mRNA stability measurements throughout the genome (far left). The panel labelled mRNA stability measurements shows how the transcripts are partitioned into equally populated bins based on their stability measures, going from left (highly stable) to right (unstable). In the heatmap representation, a gold entry marks the enrichment of the given motif in its corresponding stability bin (measured by log-transformed hypergeometric P-values), while a light-blue entry indicates motif depletion in the bin. Red and blue borders mark highly significant motif enrichments and depletions, respectively. From left to right, we show the motif names, their location (UP for 5′ UTR and DN for 3′ UTR), their sequence information (‘motif’, in the form of an alphanumeric plot), their associated mutual information values (MI; see below), their frequency (the fraction of transcripts that carry at least one instance of the motif), and their z score (see below). Each MI value is used to calculate a z score, which is the number of standard deviations of the actual MI relative to MIs calculated for 1.5 million randomly shuffled stability profiles. A structural illustration of each motif is also presented (far right) using the following single letter nucleotide code: Y = [UC], R = [AG], K = [UG], M = [AC], S = [GC], W = [AU], B = [GUC], D = [GAU], H = [ACU], V = [GCA] and N = any nucleotide.

  2. The regulatory role of sRSM1.
    Figure 2: The regulatory role of sRSM1.

    Whole-genome expression levels were measured in decoy-transfected samples relative to the controls transfected with scrambled RNA molecules (see Methods). The measurements were performed in duplicate, for two independent decoy/scrambled sets (the relative transcript levels were subsequently averaged across the two replicates in each set). Genes were sorted and quantized into equally populated bins based on the average log-ratio of their expression levels in the decoy samples relative to the scrambled controls. TEISER was used to show the enrichment/depletion patterns of transcripts harbouring sRSM1 in their 3′ UTRs. From left to right, we also show motif name, sequence, MI values and the associated z scores.

  3. HNRPA2B1 stabilizes transcripts through direct in vivo binding to sRSM1 structural motifs.
    Figure 3: HNRPA2B1 stabilizes transcripts through direct in vivo binding to sRSM1 structural motifs.

    a, Genome-wide expression levels were measured in HNRPA2B1 siRNA-transfected samples relative to mock-transfected controls. TEISER was used to capture the enrichment/depletion pattern of transcripts carrying sRSM1 across the relative expression values. Experiments were performed in triplicate, each with an independent siRNA targeting HNRPA2B1 and the resulting log ratios were averaged for each transcript. b, Transcript decay rates were compared in HNRPA2B1 knock-downs versus mock-transfected controls. These measurements were then analysed by TEISER to visualize the extent to which the decay rates of transcripts carrying sRSM1 elements were increased following HNRPA2B1 knock-down. c, Using ultraviolet-crosslinking followed by immunoprecipitation, mRNAs that bind HNRPA2B1 were extracted and compared against the input mRNA population (RIP-chip). The log ratio calculated for each mRNA denotes its abundance in the immunoprecipitated sample relative to the input control. Bins to the right contain the mRNAs that were captured as interacting partners with HNRPA2B1. Similar to the prior examples, TEISER was used to show the enrichment/depletion pattern of transcripts carrying sRSM1 in their 3′ UTRs. The values associated with each transcript were calculated as the average of log ratios from biological replicates. d, HNRPA2B1 binding sites were identified using immunoprecipitation followed by high-throughput sequencing (HITS-CLIP). Instances of the sRSM1 element are significantly enriched in these sites relative to a population of random sequences from 3′ UTRs that are not represented in the sequenced population.

  4. HNRPA2B1 regulates growth rate.
    Figure 4: HNRPA2B1 regulates growth rate.

    a, Whole genome expression levels across five breast cancer cell lines (MCF7, MDA-MB-231, HS578T, BT-549 and T47D) were correlated against their doubling times17. The resulting values, ranging from −1 to 1, were analysed by TEISER to probe the enrichment/depletion pattern of transcripts carrying sRSM1. b, The growth of HNRPA2B1 siRNA-transfected samples was compared to those of mock-transfected controls. For each time-point, the number of cells in four independent samples was counted in duplicates (n = 8), yielding an estimated growth-rate (α). Shown are the average log-ratios, their standard deviation at each time-point, and the statistical significance of the observed difference in growth-rate.

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. Dölken, L. et al. High-resolution gene expression profiling for simultaneous kinetic parameter analysis of RNA synthesis and decay. RNA 14, 19591972 (2008)
  2. Elemento, O., Slonim, N. & Tavazoie, S. A universal framework for regulatory element discovery across all genomes and data types. Mol. Cell 28, 337350 (2007)
  3. Rabani, M., Kertesz, M. & Segal, E. Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes. Proc. Natl Acad. Sci. USA 105, 1488514890 (2008)
  4. Barash, Y. et al. Deciphering the splicing code. Nature 465, 5359 (2010)
  5. Wan, Y., Kertesz, M., Spitale, R. C., Segal, E. & Chang, H. Y. Understanding the transcriptome through RNA structure. Nature Rev. Genet. 12, 641655 (2011)
  6. Pavesi, G., Mauri, G., Stefani, M. & Pesole, G. RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences. Nucleic Acids Res. 32, 32583269 (2004)
  7. Searls, D. B. The language of genes. Nature 420, 211217 (2002)
  8. Hofacker, I. L., Fekete, M. & Stadler, P. F. Secondary structure prediction for aligned RNA sequences. J. Mol. Biol. 319, 10591066 (2002)
  9. Kertesz, M. et al. Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103107 (2010)
  10. Goodarzi, H., Elemento, O. & Tavazoie, S. Revealing global regulatory perturbations across human cancers. Mol. Cell 36, 900911 (2009)
  11. Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337342 (2011)
  12. Cutroneo, K. R. & Ehrlich, H. Silencing or knocking out eukaryotic gene expression by oligodeoxynucleotide decoys. Crit. Rev. Eukaryot. Gene Expr. 16, 2330 (2006)
  13. Windbichler, N. & Schroeder, R. Isolation of specific RNA-binding proteins using the streptomycin-binding RNA aptamer. Nature Protocols 1, 637640 (2006)
  14. Biamonti, G., Ruggiu, M., Saccone, S., Della Valle, G. & Riva, S. Two homologous genes, originated by duplication, encode the human hnRNP proteins A2 and A1. Nucleic Acids Res. 22, 19962002 (1994)
  15. Wilusz, C. J., Wormington, M. & Peltz, S. W. The cap-to-tail guide to mRNA turnover. Nature Rev. Mol. Cell Biol. 2, 237246 (2001)
  16. Michlewski, G. & Caceres, J. F. Antagonistic role of hnRNP A1 and KSRP in the regulation of let-7a biogenesis. Nature Struct. Mol. Biol. 17, 10111018 (2010)
  17. Ross, D. T. et al. Systematic variation in gene expression patterns in human cancer cell lines. Nature Genet. 24, 227235 (2000)
  18. Jensen, K. B. & Darnell, R. B. CLIP: crosslinking and immunoprecipitation of in vivo RNA targets of RNA-binding proteins. Methods Mol. Biol. 488, 8598 (2008)
  19. Keene, J. D., Komisarow, J. M. & Friedersdorf, M. B. RIP-Chip: the isolation and identification of mRNAs, microRNAs and protein components of ribonucleoprotein complexes from cell extracts. Nature Protocols 1, 302307 (2006)
  20. Licatalosi, D. D. et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464469 (2008)
  21. Giannopoulou, E. G. & Elemento, O. An integrated ChIP-seq analysis platform with customizable workflows. BMC Bioinformatics 12, 277294 (2011)
  22. Beer, M. A. & Tavazoie, S. Predicting gene expression from sequence. Cell 117, 185198 (2004)
  23. Yang, Y. et al. RNA secondary structure in mutually exclusive splicing. Nature Struct. Mol. Biol. 18, 159168 (2011)
  24. Greco, T. M., Yu, F., Guise, A. J. & Cristea, I. M. Nuclear import of histone deacetylase 5 by requisite nuclear localization signal phosphorylation. Mol. Cell Proteomics 10, M110.004317 (2011)
  25. Wiśniewski, J. R., Zougman, A., Nagaraj, N. & Mann, M. Universal sample preparation method for proteome analysis. Nature Methods 6, 359362 (2009)
  26. Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460, 479486 (2009)

Download references

Author information

Affiliations

  1. Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA

    • Hani Goodarzi,
    • Panos Oikonomou &
    • Saeed Tavazoie
  2. Department of Molecular Biology, Princeton University, Princeton, New Jersey 08540, USA

    • Hani Goodarzi,
    • Panos Oikonomou,
    • Todd M. Greco,
    • Ileana M. Cristea &
    • Saeed Tavazoie
  3. Institute of Parasitology, McGill University, Montreal, Quebec H3G1Y6, Canada

    • Hamed S. Najafabadi &
    • Reza Salavati
  4. McGill Centre for Bioinformatics, McGill University, Montreal, Quebec H3G1Y6, Canada

    • Hamed S. Najafabadi &
    • Reza Salavati
  5. Laboratory of Systems Cancer Biology, Rockefeller University, New York, New York 10065, USA

    • Lisa Fish
  6. Department of Biochemistry, McGill University, Montreal, Quebec H3G1Y6, Canada

    • Reza Salavati
  7. Present addresses: Department of Biochemistry and Molecular Biophysics, and Initiative in Systems Biology, Columbia University, New York, New York 10032, USA (H.G., P.O., S.T.); The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada (H.S.N.).

    • Hani Goodarzi,
    • Hamed S. Najafabadi,
    • Panos Oikonomou &
    • Saeed Tavazoie

Contributions

H.G., H.S.N. and S.T. conceived and designed the study. H.G. and H.S.N. developed TEISER. R.S. contributed to the execution of the study. H.G., H.S.N., T.M.G., P.O., I.M.C. and S.T. designed the experiments. H.G., P.O., L.F. and T.M.G. performed the experiments. H.G., H.S.N. and T.M.G. analysed the results. H.G., H.S.N. and S.T. wrote the paper.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

The microarray and high-throughput sequencing data are deposited at GEO under the umbrella accession number GSE35800.

Author details

Supplementary information

PDF files

  1. Supplementary Information (2.5M)

    This file contains Supplementary Figures 1-15, Supplementary Tables 1-2 and additional references.

Additional data