Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features

Subjects

Abstract

RNA structure has critical roles in processes ranging from ligand sensing to the regulation of translation, polyadenylation and splicing1,2,3,4. However, a lack of genome-wide in vivo RNA structural data has limited our understanding of how RNA structure regulates gene expression in living cells. Here we present a high-throughput, genome-wide in vivo RNA structure probing method, structure-seq, in which dimethyl sulphate methylation of unprotected adenines and cytosines is identified by next-generation sequencing. Application of this method to Arabidopsis thaliana seedlings yielded the first in vivo genome-wide RNA structure map at nucleotide resolution for any organism, with quantitative structural information across more than 10,000 transcripts. Our analysis reveals a three-nucleotide periodic repeat pattern in the structure of coding regions, as well as a less-structured region immediately upstream of the start codon, and shows that these features are strongly correlated with translation efficiency. We also find patterns of strong and weak secondary structure at sites of alternative polyadenylation, as well as strong secondary structure at 5′ splice sites that correlates with unspliced events. Notably, in vivo structures of messenger RNAs annotated for stress responses are poorly predicted in silico, whereas mRNA structures of genes related to cell function maintenance are well predicted. Global comparison of several structural features between these two categories shows that the mRNAs associated with stress responses tend to have more single-strandedness, longer maximal loop length and higher free energy per nucleotide, features that may allow these RNAs to undergo conformational changes in response to environmental conditions. Structure-seq allows the RNA structurome and its biological roles to be interrogated on a genome-wide scale and should be applicable to any organism.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of structure-seq.
Figure 2: Structure-seq accurately maps 18S rRNA and agrees with gel-based in vivo structure probing.
Figure 3: Structure-seq reveals new features of mRNA secondary structures that prevail in vivo.
Figure 4: Structure-seq provides in vivo RNA structure information at nucleotide resolution across 10,623 mRNAs and reveals correlations between RNA structure and biological function.

Similar content being viewed by others

Accession codes

Accessions

Sequence Read Archive

Data deposits

Sequencing data are deposited in the Sequence Read Archive (SRA) on the NCBI website under the accession number SRP027216.

References

  1. Buratti, E. et al. RNA folding affects the recruitment of SR proteins by mouse and human polypurinic enhancer elements in the fibronectin EDA exon. Mol. Cell. Biol. 24, 1387–1400 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Cruz, J. A. & Westhof, E. The dynamic landscapes of RNA architecture. Cell 136, 604–609 (2009)

    Article  CAS  PubMed  Google Scholar 

  3. Kozak, M. Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene 361, 13–37 (2005)

    Article  CAS  PubMed  Google Scholar 

  4. Sharp, P. A. The centrality of RNA. Cell 136, 577–580 (2009)

    Article  CAS  PubMed  Google Scholar 

  5. Kertesz, M. et al. Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 (2010)

    Article  ADS  CAS  PubMed  Google Scholar 

  6. Li, F. et al. Regulatory impact of RNA secondary structure across the Arabidopsis transcriptome. Plant Cell 24, 4346–4359 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Zheng, Q. et al. Genome-wide double-stranded RNA sequencing reveals the functional significance of base-paired RNAs in Arabidopsis. PLoS Genet. 6, e1001141 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  8. Wan, Y. et al. Genome-wide measurement of RNA folding energies. Mol. Cell 48, 169–181 (2012)

    Article  PubMed  PubMed Central  Google Scholar 

  9. Senecoff, J. F. & Meagher, R. B. In vivo analysis of plant RNA structure: soybean 18S ribosomal and ribulose-1,5-bisphosphate carboxylase small subunit RNAs. Plant Mol. Biol. 18, 219–234 (1992)

    Article  CAS  PubMed  Google Scholar 

  10. Wells, S. E., Hughes, J. M. X., Igel, A. H. & Ares, M. Use of dimethyl sulfate to probe RNA structure in vivo. Methods Enzymol. 318, 479–493 (2000)

    Article  CAS  PubMed  Google Scholar 

  11. Zaug, A. J. & Cech, T. R. Analysis of the structure of Tetrahymena nuclear RNAs in vivo: telomerase RNA, the self-splicing rRNA intron, and U2 snRNA. RNA 1, 363–374 (1995)

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Zemora, G. & Waldsich, C. RNA folding in living cells. RNA Biol. 7, 634–641 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  13. Mathews, D. H. et al. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl Acad. Sci. USA 101, 7287–7292 (2004)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  14. Oh, E., Zhu, J. Y. & Wang, Z. Y. Interaction between BZR1 and PIF4 integrates brassinosteroid and environmental responses. Nature Cell Biol. 14, 802–809 (2012)

    Article  CAS  PubMed  Google Scholar 

  15. Moazed, D., Stern, S. & Noller, H. F. Rapid chemical probing of conformation in 16 S ribosomal RNA and 30 S ribosomal subunits using primer extension. J. Mol. Biol. 187, 399–416 (1986)

    Article  CAS  PubMed  Google Scholar 

  16. Cannone, J. J. et al. The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3, 2 (2002)

    Article  PubMed  PubMed Central  Google Scholar 

  17. Gutell, R. R., Lee, J. C. & Cannone, J. J. The accuracy of ribosomal RNA comparative structure models. Curr. Opin. Struct. Biol. 12, 301–310 (2002)

    Article  CAS  PubMed  Google Scholar 

  18. Shabalina, S. A., Ogurtsov, A. Y. & Spiridonov, N. A. A periodic pattern of mRNA secondary structure created by the genetic code. Nucleic Acids Res. 34, 2428–2437 (2006)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Branco-Price, C., Kawaguchi, R., Ferreira, R. B. & Bailey-Serres, J. Genome-wide analysis of transcript abundance and translation in Arabidopsis seedlings subjected to oxygen deprivation. Ann. Bot. (Lond.) 96, 647–660 (2005)

    Article  CAS  Google Scholar 

  20. Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  21. Branco-Price, C., Kaiser, K. A., Jang, C. J., Larive, C. K. & Bailey-Serres, J. Selective mRNA translation coordinates energetic and metabolic adjustments to cellular oxygen deprivation and reoxygenation in Arabidopsis thaliana. Plant J. 56, 743–755 (2008)

    Article  CAS  PubMed  Google Scholar 

  22. Shen, Y. et al. Transcriptome dynamics through alternative polyadenylation in developmental and environmental responses in plants revealed by deep sequencing. Genome Res. 21, 1478–1486 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Loke, J. C. et al. Compilation of mRNA polyadenylation signals in Arabidopsis revealed a new signal element and potential secondary structures. Plant Physiol. 138, 1457–1468 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Solnick, D. Alternative splicing caused by RNA secondary structure. Cell 43, 667–676 (1985)

    Article  CAS  PubMed  Google Scholar 

  25. Jin, Y., Yang, Y. & Zhang, P. New insights into RNA secondary structure in the alternative splicing of pre-mRNAs. RNA Biol. 8, 450–457 (2011)

    Article  CAS  PubMed  Google Scholar 

  26. Filichkin, S. A. et al. Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res. 20, 45–58 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lu, Z. J., Gloor, J. W. & Mathews, D. H. Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA 15, 1805–1813 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Deigan, K. E., Li, T. W., Mathews, D. H. & Weeks, K. M. Accurate SHAPE-directed RNA structure determination. Proc. Natl Acad. Sci. USA 106, 97–102 (2009)

    Article  ADS  CAS  PubMed  Google Scholar 

  29. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)

    Article  CAS  PubMed  Google Scholar 

  30. Misra, V. K. & Draper, D. E. The linkage between magnesium binding and RNA folding. J. Mol. Biol. 317, 507–521 (2002)

    Article  CAS  PubMed  Google Scholar 

  31. Lucks, J. B. et al. Multiplexed RNA structure characterization with selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc. Natl Acad. Sci. USA 108, 11063–11068 (2011)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  32. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  33. Hajdin, C. E. et al. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc. Natl Acad. Sci. USA 110, 5498–5503 (2013)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  34. Smith, C. J. Diagnostic tests (2) – positive and negative predictive values. Phlebology 27, 305–306 (2012)

    Article  ADS  CAS  PubMed  Google Scholar 

  35. Reuter, J. S. & Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11, 129 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  36. Lawley, P. D. & Brookes, P. Further studies on the alkylation of nucleic acids and their constituent nucleotides. Biochem. J. 89, 127–138 (1963)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Weeks, K. M. & Crothers, D. M. RNA recognition by Tat-derived peptides: interaction in the major groove? Cell 66, 577–588 (1991)

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This research is supported by Human Frontier Science Program (HFSP) grant RGP0002/2009-C, the Penn State Eberly College of Science, and a Penn State Huck Institutes HITS grant to P.C.B. and S.M.A. We thank F. Pugh, Y. Li, A. Chan and K. Yen for help with Illumina sequencing; D. Mathews and A. Spasic for advice on RNA structure analysis; M. Axtell for reading of the manuscript; and P. Raghavan for access to the CyberSTAR server, funded by the National Science Foundation through grant OCI–0821527. We also thank L. Song, D. Chadalavada and S. Ghosh for discussions.

Author information

Authors and Affiliations

Authors

Contributions

Y.D. and C.K.K. performed the experiments. Y.D., Y.T. and C.K.K. performed data analysis. Statistical analyses were designed by Y.Z. and Y.T., with input from all authors. Y.D., Y.T. and C.K.K. contributed equally to this work. All authors contributed ideas, discussed the results and wrote the manuscript.

Corresponding authors

Correspondence to Philip C. Bevilacqua or Sarah M. Assmann.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Time course of DMS modification and overview of structure-seq libraries.

a, Time course of in vivo DMS modification of 18S rRNA in Arabidopsis etiolated seedlings. Five-day-old Arabidopsis etiolated seedlings were DMS-treated for different durations (1 min, 5 min, 15 min and 30 min; lanes 2–5, respectively). In all cases the final DMS concentration was 0.75% (75 mM). The 18S rRNA DMS modification read-out was assessed by gel-based probing, which was done here near the 5′ end to afford a view of the full-length RNA band. The 15-min time point is the optimal duration for DMS modification, as it is the longest time point for which single-hit kinetics still occur as revealed by the intense full-length band. The 30-min time point is too long, as revealed by significant loss of the full-length band and increase of shorter length bands. Lanes 6–9 show the dideoxy sequencing of 18S rRNA. Lane 1 is the (−)DMS control. Lane 10 is a DNA marker (M) that was size fractionated to confirm the size of the full-length band (112 nt). b, DMS modification is RNA nucleotide specific. Nucleotide occurrence of RNA bases one nucleotide upstream of the position of reverse transcriptase stalling on the (+)DMS library and (−)DMS library, respectively. The (+)DMS library shows higher occurrence of A and C than of U and G (A is more than 1 standard deviation higher compared to C, G and U, and C is more than 1 standard deviation higher compared to G and U if leaving out A), consistent with the properties of DMS modification of nucleobases36. The percentages of each RNA base in the (−)DMS library are also indicated and are found to be similar (within 1 standard deviation). This figure combines results from both biological replicates. c, The total number of reads was classified into different classes of RNAs on a percentage basis from a total number of 121,258,873 reads for the (+)DMS library and 85,371,519 reads for the (−)DMS library. This figure combines results from both biological replicates. d, Structure-seq reads coverage. RNA structure information from structure-seq is distributed evenly across transcripts, with no 3′ bias. Each of the 37,558 transcripts (all transcripts with ≥ 1 internal reverse transcriptase stop and length ≥ 100 nt) was divided into 100 bins to normalize the transcript length. The reverse transcriptase stops per each A and C nucleotide (top) and the reverse transcriptase stops per each A and C nucleotide with ≥ 1 reverse transcriptase stop (bottom) from both the (+)DMS library (black diamonds) and the (−)DMS library (grey triangles) were averaged within each bin and plotted. The reverse transcriptase stops are well distributed over the entire transcript length.

Extended Data Figure 2 Structure-seq reveals in vivo RNA secondary structures for over 10,000 transcripts and correlates with mRNA abundance.

a, Structure-seq reveals in vivo RNA secondary structures for over 10,000 transcripts. The histogram shows the number of transcripts as a function of the average reverse transcriptase stops associated with A + C nucleotides of a transcript, divided by the total number of the A + C nucleotides of that transcript, calculated for all individual transcripts in our data set. (Note that it is expected that not all As and Cs of a transcript will be DMS-modified and associated with an reverse transcriptase stop, because some As and Cs will be protected, for example, by base-pairing, tertiary structure or protein binding.) There are 10,781 transcripts with ≥ 1 average read per A + C nucleotides (dark shading and to the right of the right-most dashed red line). With a threshold of 0.5 average reads per A + C nucleotides, there are 15,565 transcripts (to the right of the left-most dashed red line). It is of interest to compare structure-seq, which provides the first high-throughput in vivo RNA structurome, with previous high-throughput studies of RNA structures conducted in vitro5,6,7,8. We have coverage with ≥ 1 average reverse transcriptase stop per nucleotide across 10,623 mRNAs, which compares favourably with 3,000 mRNAs with load (number of reads per nucleotide) > 1 from an in vitro study of yeast5. In comparison with 3.9 × 105 reads (0.0078 RNase One cleavages per nucleotide on average) on mRNAs in the single-stranded RNA-seq library of an in vitro study of RNA structure in Arabidopsis6, we have much improved coverage with 7.1 × 107 reads (1.4 reverse transcriptase stops per nucleotide on average) on mRNAs in our (+)DMS in vivo library. b, c, Structure-seq queries in vivo RNA structures in proportion to their abundance in the transcriptome. mRNA abundance within our structure-seq data set is highly correlated with mRNA abundance from RNA-seq analysis in this study (b) and with RNA-seq analysis from a previous study (c)14. Correlation of mRNA abundance is based on average sequencing reads per mRNA between structure-seq and RNA-seq. The RNA-seq data set in our study was generated in parallel with the structure-seq data set from seedlings under the identical growth conditions but without DMS; that is, the RNA-seq data are extracted from the (−)DMS library. The RNA-seq data set from ref. 14 was generated from five-day-old etiolated seedlings. The PCCs of 0.89 and 0.78, respectively, indicate that more abundant mRNAs are more likely to have sufficient coverage available for structure-seq analysis.

Extended Data Figure 3 Structure-seq provides the complete map of the 18S rRNA in vivo structure at nucleotide resolution.

a, Structure-seq provides the complete map of the 18S rRNA in vivo structure at nucleotide resolution. The complete 18S rRNA phylogenetic structure16 is colour-coded according to the DMS reactivity generated from structure-seq (DMS reactivity ≥0.6 marked in red; DMS reactivity 0.3–0.6 marked in yellow; DMS reactivity 0–0.3 marked in green; and U/G bases marked in grey). b, High correlation between structure-seq and 18S rRNA phylogenetic structure. In the entire 18S rRNA (length = 1,808 nt), 86.7% (true positive) of the As and Cs that show high in vivo DMS reactivity (defined as ≥0.6) in our data set correspond to single-stranded regions in the phylogenetic structure16, whereas 52.0% (true negative) of the As and Cs that show low in vivo DMS reactivity (defined as ≤ 0.3) in our data set correspond to base-paired regions in the phylogenetic structure. The 48.0% (false negative) of the As and Cs that show low in vivo DMS reactivity in our data set but correspond to single-stranded regions in the phylogenetic structure presumably are protected by either ribosomal proteins or non-base-pairing tertiary RNA structure. Of the 13.3% (false positive) reactive nucleotides (defined as ≥ 0.6 from structure-seq) that are annotated as base-paired in the phylogenetic structure, 75% of these nucleotides are positioned either at the end of a helix or adjacent to a helical defect such as a bulge or loop, locations that are known to lead to flexibility37. Values in parentheses, corrected for this positioning, show higher true positive and lower false positive percentages.

Extended Data Figure 4 Structure-seq results are strongly correlated with results from the conventional gel-based RNA structure probing method.

a, Nucleotides 87–207 of 18S rRNA were probed by the conventional gel-based method. Lanes 1–2 show the (−)DMS and (+)DMS results on the region of interest. Lanes 3–4 show C and A dideoxy sequencing. For both this panel and structure-seq, the starting material was the same total population of in vivo DMS-modified RNA. b, The results from structure-seq (blue bars) are compared to results from the conventional gel-based method, presented as normalized band intensity (black lines), with the highest intensity normalized to 100%5. The red asterisks indicate nucleotides that have significant DMS modifications from both methods, and are also shown in panel a. Structure-seq results are strongly correlated with results from the conventional gel-based RNA structure probing method: the PCC between the two methods is 0.71. c, d, Nucleotides 298–428 of 18S rRNA as probed by structure-seq and also analysed by the conventional gel-based method. The PCC is 0.68. eg, Structure-seq results are also strongly correlated with results from the conventional gel-based RNA structure probing method for an individual mRNA, CAB1 (At1g29930). The 5′ UTR of CAB1 was probed by structure-seq and analysed by the gel-based method; in both cases, the starting material was the same total population of in vivo DMS-modified RNA. e, Lanes 1–2 show the (−)DMS and (+)DMS results on the region of interest as analysed by the conventional gel-based method. A 10-nt marker (M) was size fractionated (lane 3) to allow nucleotide assignment based on spacing. f, DMS reactivity from structure-seq is plotted with nucleotide resolution (blue bars). Results from the gel-based RNA structure probing method are presented as normalized quantified band intensity (black lines), with the highest intensity normalized to 100%5. For the gel-based method, the nucleotides near the 5′ end cannot be confidently quantified and assigned due to band compression at the top of the gel and proximity to the full-length band. The PCC between the two methods is 0.66. g, The secondary structure of the 5′ UTR of CAB1 mRNA (At1g29930) was determined using the in vivo DMS constraints obtained from structure-seq. (DMS reactivity ≥ 0.6 marked in red; DMS reactivity 0.3–0.6 marked in yellow; DMS reactivity 0–0.3 marked in green; and U/G bases marked in grey).

Extended Data Figure 5 Structure-seq reveals global trends in mRNA secondary structure in vivo that correlate with translation efficiency.

a, Average DMS reactivity on an A + C nucleotide basis in selected regions of 22,721 mRNAs (including all splice variants) that have 5′ and 3′ UTR regions longer than 40 nt: 5′ UTR region (40 nt upstream of the start codon); CDS initial region (100 nt downstream of the start codon); CDS final region (100 nt upstream of the stop codon); and 3′ UTR region (40 nt downstream of the stop codon) are depicted. The transcripts were aligned by their start codon and stop codon (vertical lines). (Us and Gs in the start codon and the stop codon were not counted, marked by a break in the red line.) The 40-nt 5′ UTR and 3′ UTR regions show significantly higher average DMS reactivity than the flanking 100 nt of the CDS region, with P values of 10−4 and 10−18, respectively (Student’s t-tests). The first 5 nt immediately upstream of the start codon show significantly higher reactivity than the average DMS reactivity across the first 100 nt of the CDS with P value of 10−112 (Student’s t-test). b, Discrete Fourier transform of average DMS reactivity on a nucleotide basis was performed on the 40-nt 5′ UTR (green line), the first 100 nt of the CDS (purple line) and the 40-nt 3′ UTR (blue line) regions. Only the CDS shows the periodic signal. For the analysis, the 40-nt 5′ UTRs and 3′ UTRs were compared to the first 100 nt of the CDS regions. c, The average DMS reactivity of the three positions in each codon was computed from the entire CDS regions of all 22,721 mRNAs. The first position of each codon shows significantly higher average DMS reactivity compared with the second position of each codon (P = 10−27). The third position of each codon shows significantly higher average DMS reactivity compared with the second position (P = 10−5) but significantly lower average DMS reactivity compared with the first position of each codon (P = 10−17) (Student’s t-tests). d, Structure-seq reveals significantly stronger periodic signal in the coding regions of high translation efficiency mRNAs (1,136 mRNAs) as compared to low translation efficiency mRNAs (1,136 mRNAs). We analysed the polyribosome-associated mRNA populations defined in a previous study21, ranking the mRNAs according to their polyribosome-associated mRNA abundance21. We defined the top 5% (n = 1,136 mRNAs) as the ‘high translation efficiency mRNAs’ and the bottom 5% (n = 1,136 mRNAs) as the ‘low translation efficiency mRNAs’. The average DMS reactivity of the three positions of each codon was computed along the entire CDS for the high translation efficiency mRNAs and the low translation efficiency mRNAs. The difference in average DMS reactivity between the three nucleotides is significantly greater in the high translation efficiency transcripts (nt 1–2, P = 10−23; nt 2–3, P = 0.02; nt 1–3, P = 10−15) than in the low translation efficiency transcripts (nt 1–2, P = 0.29; nt 2–3, P = 0.99; nt 1–3, P = 0.34) (Student’s t-tests). e, No nucleotide or codon bias in high versus low translation efficiency mRNAs occurs in any of the three positions of the codon. There is no difference between high translation efficiency mRNAs (1,136 mRNAs) and low translation efficiency mRNAs (1,136 mRNAs) in the frequency of nucleotide occurrence at each codon position. The correlation between the codon usage of the high translation efficiency mRNAs and low translation efficiency mRNAs is very high (PCC = 0.90).

Extended Data Figure 6 Control analyses for alternative polyadenylation and alternative splicing.

a, The percentages of nucleotide occurrence around the site of alternative polyadenylation show a U/A rich region from -15 nt to -2 nt (P = 10−16 Student’s t-test), and the region from 1 nt upstream to 5 nt downstream (nt -1 to 5) of the cleavage site is A-rich (P = 10−5 Student’s t-test). This pattern is not unlike that reported for a combined data set of all polyadenylation sites23. The percentages of nucleotide occurrence are plotted relative to the alternative polyadenylation site position collected from a previous study22, indicated by 0: (A (orange diamonds); U (dark red squares); C (blue circles); and G (green triangles)). b-c, Nucleotide composition and sequence alone cannot account for the RNA structural pattern of the alternative polyadenylation site. b, We identified 20 nt regions in our structure-seq mRNA data set that are not alternative polyadenylation cleavage sites but contain the same exact nucleotide sequence as the region 15 nt upstream and 5 nt downstream of each alternative polyadenylation cleavage site that we analysed. The percentages of nucleotide occurrence are plotted relative to the position corresponding to where the alternative polyadenylation site (designated as position zero) would be situated: (A (orange diamonds); U (dark red squares); C (blue circles); and G (green triangles)). c, For the selected control region from panel b, DMS reactivity of these selected 20 nt control regions as well as the regions upstream (35 nt) and downstream (45 nt) was averaged on a nucleotide basis and plotted, revealing absence of any structural features (violet line). d, Extensive RNA secondary structure was not apparent at the 3′ splice site. A previous genome-wide study of alternative splicing (AS) in Arabidopsis seedlings26 was used to identify for each mRNA in our data set, whether all introns were spliced out or whether AS (including exon skipping and intron retention) occurred. DMS reactivity along 100 nt in the exons upstream of the 3′ splice site was averaged on a nucleotide basis from the unspliced events, including both exon skipping and intron retention (green lines), and the spliced events (yellow lines). The same nucleotide composition of the 100 nt in the unspliced AS events was shuffled and remapped to regions in our structure-seq mRNA data set that were not located at the junction of a 3′ splice site. The averaged DMS reactivity collected from the control regions with the same nucleotide composition served as the control (grey lines).

Extended Data Figure 7 In vitro structures differ from in vivo structures; PPV does not correlate with average DMS reactivity or with mRNA length.

a, In vitro structures differ from in vivo structures, and in vitro structures are more similar to in silico structures than are in vivo structures. The 61 Arabidopsis mRNAs with coverage ≥ 0.5 cleavages per nucleotide from Li et al.’s in vitro data were compared among the in silico structure (from RNAstructure), the in vitro structure (in silico structures from RNAstructure constrained by Li et al.’s in vitro data)6, and the in vivo structure (in silico structures from RNAstructure constrained by our in vivo data). PPV (the base pairs in one structure that are also present in another structure, as a proportion) was averaged across these 61 mRNAs. The PPV between in vitro structures and in silico structures is 0.77, which is significantly higher than the PPV between in vivo structures and in silico structures and is also significantly higher than the PPV between in vivo and in vitro structures, according to two sample t-tests with P values as shown in the figure. In vivo structures are different from both in vitro structures (PPV = 0.51) and in silico structures (PPV = 0.55). b, PPV does not correlate with average DMS reactivity per nucleotide. For each of 10,623 mRNAs in our structure-seq data set, the corresponding PPV of each mRNA was plotted, revealing an absence of correlation between PPV and average DMS reactivity per nucleotide (PCC = −0.33). c, PPV does not correlate with mRNA length. For each of 10,623 mRNAs, the corresponding PPV of each mRNA was plotted as a function of mRNA length, revealing an absence of correlation between these two variables (PCC = −0.10).

Extended Data Figure 8 Examples for in vivo and in silico structural feature comparison of high and low PPV mRNAs.

a, Ten examples for in vivo and in silico structural comparison of high and low PPV mRNAs. Five examples from the high PPV mRNA group (top) and five examples from the low PPV mRNA group (bottom). At1g52600 and At3g05880 mRNA structures were given in Fig. 4d. Base pair predictions are indicated with coloured lines: red, uniquely in vivo base pair; black, uniquely in silico base pair; green, base pair present in both the in vivo and the in silico structure. Plots were generated using the CircleCompare program in the RNAstructure package35. Low PPV mRNAs show more extensive differences between in vivo and in silico structures than do high PPV mRNAs. b, Characteristics of in vivo and in silico structural features in the ten high and low PPV mRNAs. The same five examples from both high PPV and low PPV mRNAs as in a were assessed for RNA structural features in both in silico-predicted (without in vivo constraints) and in vivo (in silico prediction with constraints from our in vivo structure-seq data) structures. In vivo structures of low PPV mRNAs show more single stranded regions, longer maximum loop length, and higher (that is, less favourable) free energy per nucleotide as compared to high PPV mRNAs. By contrast, in silico-predicted structures do not show such major differences between low and high PPV mRNAs.

Extended Data Table 1 Statistical analysis of structure-seq libraries
Extended Data Table 2 In vivo constraints improve the prediction of structure in 18S rRNA

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, Y., Tang, Y., Kwok, C. et al. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700 (2014). https://doi.org/10.1038/nature12756

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature12756

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing