RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP)

Journal name:
Nature Methods
Volume:
11,
Pages:
959–965
Year published:
DOI:
doi:10.1038/nmeth.3029
Received
Accepted
Published online

Abstract

Many biological processes are RNA-mediated, but higher-order structures for most RNAs are unknown, which makes it difficult to understand how RNA structure governs function. Here we describe selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) that makes possible de novo and large-scale identification of RNA functional motifs. Sites of 2′-hydroxyl acylation by SHAPE are encoded as noncomplementary nucleotides during cDNA synthesis, as measured by massively parallel sequencing. SHAPE-MaP–guided modeling identified greater than 90% of accepted base pairs in complex RNAs of known structure, and we used it to define a new model for the HIV-1 RNA genome. The HIV-1 model contains all known structured motifs and previously unknown elements, including experimentally validated pseudoknots. SHAPE-MaP yields accurate and high-resolution secondary-structure models, enables analysis of low-abundance RNAs, disentangles sequence polymorphisms in single experiments and will ultimately democratize RNA-structure analysis.

At a glance

Figures

  1. SHAPE-MaP overview.
    Figure 1: SHAPE-MaP overview.

    RNA is treated with a SHAPE reagent that reacts at conformationally dynamic nucleotides. During reverse transcription, polymerase reads through chemical adducts in the RNA and incorporates a nucleotide noncomplementary to the original sequence (red) into the cDNA. The resulting cDNA is sequenced using any massively parallel approach to create a mutational profile. Sequencing reads are aligned to a reference sequence, and nucleotide-resolution mutation rates are calculated, corrected for background and normalized, producing a standard SHAPE reactivity profile. SHAPE reactivities can then be used to model secondary structures, visualize competing and alternative structures, or quantify any process or function that modulates local nucleotide RNA dynamics.

  2. Nucleotide-resolution interrogation of RNA structure and ligand-induced conformational changes.
    Figure 2: Nucleotide-resolution interrogation of RNA structure and ligand-induced conformational changes.

    (a) Mutation rate profiles for the SHAPE-modified and untreated TPP riboswitch RNA in the presence of ligand (top) and for SHAPE modification performed under denaturing conditions (bottom). (b) Quantitative SHAPE-MaP obtained after subtracting the data from the untreated sample from data for the treated sample and normalizing by the denatured control. (c) SHAPE reactivities plotted on the accepted secondary structure of the ligand-bound TPP riboswitch. Red, orange and black correspond to high, moderate and low reactivities, respectively. (d) Difference SHAPE profile showing conformational changes in the TPP riboswitch upon ligand binding. (e) Superposition of ligand-induced conformational changes on the TPP riboswitch structure. Data are representative of two biological replicates.

  3. Accuracy of SHAPE-MaP-directed modeling of secondary structure.
    Figure 3: Accuracy of SHAPE-MaP–directed modeling of secondary structure.

    (a) Secondary structure modeling accuracies reported as a function of sensitivity (sens) and positive predictive value (ppv) for calculations performed without experimental constraints (no data), with conventional capillary electrophoresis (CE) data, and with SHAPE-MaP data obtained with the 1M7 reagent22, 38 or with three-reagent differential (Diff) data19. Minus signs indicate no data were collected. Results are colored on a scale to reflect low (red) to high (green) modeling accuracy. (b) Relationship between sequencing read depth, hit level and accuracy of RNA-structure modeling. Model accuracy (vertical axis) is shown as the geometric average of the sensitivity and positive predictive value of predicted structures with respect to the accepted model38. Boxplots summarize modeling the secondary structure of the 16S ribosomal RNA as a function of simulated SHAPE-MaP read depth. At each depth, 100 folding trajectories were sampled. The line at the center of the box indicates the median value, and boxes indicate the interquartile range. Whiskers contain data points that are within 1.5 times the interquartile range, and outliers are indicated with (+) marks. Hit level is the total signal above normalized background per transcript nucleotide (Online Methods).

  4. SHAPE-MaP analysis of the HIV-1 NL4-3 genome.
    Figure 4: SHAPE-MaP analysis of the HIV-1 NL4-3 genome.

    (a) SHAPE reactivities, Shannon entropy and pairing probability for the NL4-3 HIV-1 genomic RNA. Reactivities are shown as the centered 55-nt median window, relative to the global median; regions above or below the line are more flexible or constrained than the median, respectively. Arcs representing base pairs are colored by their respective pairing probabilities, with green arcs indicating highly probable helices. Areas with many overlapping arcs have multiple potential structures. Pseudoknots (PK) are indicated by black arcs. Data shown correspond to a single representative experiment; individual regions, including proposed pseudoknots, were confirmed by independent replicates. (b) RNA regions identified as having biological functions. Brackets enclose well-determined regions and are drawn to emphasize locations of these regions relative to known RNA features in the context of the viral genome. Regions with blue shading correspond to low-SHAPE, low-Shannon-entropy domains and are extended to include all intersecting helices from the lowest predicted free-energy secondary structure. 5′ and 3′ UTRs (including the repeat (R), Psi and dimerization initiation site (DIS) sequences) are brown, splice acceptors (A1–A8) and donors (D1–D4) are green and blue, respectively, polypurine tracts are yellow, variable domains (V1–V5) are purple, and the frameshift and RRE domains are red. These elements fall within regions with low SHAPE and low Shannon entropy much more frequently than expected by chance (*P = 0.002; Online Methods). (c) Secondary structure models for regions, identified de novo, with low SHAPE reactivities and low Shannon entropies. Nucleotides are colored by SHAPE reactivity and pseudoknotted structures are labeled in blue. Larger figure images, showing nucleotide identities, are provided in Supplementary Figure 7.

  5. Functional and structural validation of newly discovered HIV-1 RNA motifs.
    Figure 5: Functional and structural validation of newly discovered HIV-1 RNA motifs.

    (a) Scheme for simultaneous deconvolution and structural analysis of a mixture of native sequence and U3PK mutant genomes. (b) SHAPE profiles for the U3PK pseudoknot bridging U3 and R. The experiment simultaneously probed a mixture of viruses with native sequence and mutant U3PK RNAs. Secondary structure for the native sequence is shown as arcs below the y-axis intercept. Significant SHAPE reactivity differences are highlighted (Online Methods). Data are shown for single experiments performed in parallel. (c) Direct growth competition and viral spread for U3PK mutant and native sequence NL4-3 HIV-1 virions in Jurkat cells. Each line in the viral spread assay is a biological replicate representative of 2–4 technical replicates. Percentage of mutant in the initial inoculum is presented as a gray square at day 0. p24 levels correspond to the amount of HIV-1 capsid protein. (d) SHAPE profiles for RTPK, obtained in separate experiments for each virus. (e) Viral spread and direct growth competition for RTPK mutant and native sequence NL4-3 HIV-1 virions in Jurkat cells. For the competition data, y axes are shown on an expanded scale for clarity.

  6. SHAPE-MaP data analysis pipeline.
    Supplementary Fig. 1: SHAPE-MaP data analysis pipeline.

    Outline of software pipeline that fully automates calculations of per-nucleotide mutation rates, SHAPE reactivities, and standard error estimates given high-throughput sequencing data and at least one reference sequence. The software is executable on Unix-based platforms. See Online Methods for full description. This strategy is implemented in the ShapeMapper software.

  7. Overview of multiscale RNA secondary structure modeling.
    Supplementary Fig. 2: Overview of multiscale RNA secondary structure modeling.

    Genome-scale RNA secondary structure modeling was performed in steps to increase computational efficiency and model accuracy and to facilitate incorporation of pseudoknot prediction into a global model. (top) The first step involved searching for pseudoknots in short windows. For all later steps, pseudoknot pairs were prohibited from forming base pairs. In the second step, the partition function was calculated in overlapping windows and averaged for Shannon entropy and pairing probability evaluations (see Online Methods). Base pairs with probabilities ≥99% were forced to form during calculation of a minimum free energy structure using Fold47, using overlapping 4000-nt windows. In the third step, a consensus structure from the overlapping windows was generated by retaining base pairs that appeared in more than half of possible windows. Finally, pseudoknotted helices were added to the final model. (bottom) Comparison of windowed folding versus one-step folding for calculating the partition functions and minimum free energy structures for RNAs of 1500 to 9200 nts. Wall-time for modeling the entire HIV-1 was estimated (asterisk). A small performance penalty is observed for splitting an RNA into overlapping windows. However, computation time for RNAs over 3000 nts will scale approximately linearly with sequence length. Folding times are reported both as wall clock times and as cpu cycles (2012 iMac with 3.1 GHz Intel Core i7 and 16 GB RAM). This strategy is implemented in the SHAPE-MaP Folding Pipeline.

  8. Strategies for the SHAPE-MaP experiment using either gene-specific primers or random fragmentation for sample analysis and sequencing library preparation.
    Supplementary Fig. 3: Strategies for the SHAPE-MaP experiment using either gene-specific primers or random fragmentation for sample analysis and sequencing library preparation.

    SHAPE-MaP can be performed using gene-specific primers (for small RNAs or targeted areas in large RNAs and for analysis of scarce and low concentration RNAs) or random primers (for comprehensive analysis of large RNAs or complete transcriptomes) to create the initial cDNA pool. For both approaches, RNA is treated with a SHAPE reagent or with solvent under conditions of interest, and a sample of RNA is modified under denaturing conditions. For gene-specific samples, reverse transcription and PCR primers are designed based on the known target sequence. Large RNAs are randomly fragmented in a buffered Mg2+ solution. Single-stranded cDNA was synthesized using mutation-prone reverse transcription; misincorporation events in the nascent cDNA mark the location of SHAPE adducts in the subject RNA. Double-stranded cDNAs were created either by PCR (gene-specific approach) or second-strand synthesis (randomly fragmented samples). Sequence platform-specific sequences (including multiplexing barcodes) were added to the dsDNA libraries, either directly through a second PCR (gene-specific approach) or by a DNA-DNA ligation of adaptor sequences (random fragmented samples). Libraries prepared by either method were then sequenced, producing data that were processed into SHAPE reactivity profiles used in structure modeling applications. SHAPE-MaP is fully independent of sequencing platform and library generation scheme (once the initial cDNA has been synthesized). Thus, any platform and any library generation scheme can be used.

  9. Mutation rate histograms for paired and non-paired nucleotides in the 16S rRNA.
    Supplementary Fig. 4: Mutation rate histograms for paired and non-paired nucleotides in the 16S rRNA.

    Nucleotides were separated into paired (upper panels) and non-paired (lower panels) groups based on their observed pairing in the E. coli 16S rRNA38. Mutation rate histograms for each experimental sample (SHAPE, untreated, and denatured) were calculated based on pairing status (left-hand panels). Distributions of mutation rates for the SHAPE-modified and untreated samples are similar for base-paired nucleotides; whereas nucleotides in non-paired conformations are much more reactive towards SHAPE probing. (right-hand panels) SHAPE-MaP reactivities are independent of nucleotide type.

  10. SHAPE-MaP replicates of E. coli 16S rRNA.
    Supplementary Fig. 5: SHAPE-MaP replicates of E. coli 16S rRNA.

    Data correspond to full biological replicates performed six months apart by different individuals. The inset for nucleotides 1350-1450 (bottom right) shows standard errors.

  11. Error analysis for SHAPE-MaP.
    Supplementary Fig. 6: Error analysis for SHAPE-MaP.

    Deep bootstrapping of highly sequenced TPP riboswitch samples (see Fig. 2). Individual sequencing reads from a large pool (150,000) were sampled with replacement 100 times per simulated depth. The standard error of the SHAPE reactivity was calculated at each depth from each bootstrap. Consistent with a Poisson model, the standard error of the SHAPE measurement decreased as the -1/2 power of read depth across all nucleotides.

  12. Secondary structure models for regions of HIV-1NL4-3, identified de novo, with low SHAPE reactivities and low Shannon entropies.
    Supplementary Fig. 7: Secondary structure models for regions of HIV-1NL4-3, identified de novo, with low SHAPE reactivities and low Shannon entropies.

    Nucleotides are colored by SHAPE reactivity. Structures are the same as shown in Fig. 4c, except that nucleotide identities are shown explicitly.

  13. Pseudoknot mutants.
    Supplementary Fig. 8: Pseudoknot mutants.

    Green arrows indicate sites of disruptive mutations.

  14. Deconvolution of profiles for two alleles of the U3PK in a single SHAPE-MaP experiment.
    Supplementary Fig. 9: Deconvolution of profiles for two alleles of the U3PK in a single SHAPE-MaP experiment.

    RNAs with nearly identical sequences can be computationally separated and analyzed using data generated from a single experiment (see Online Methods). Yellow bars indicate significant SHAPE reactivity differences (and highlight the same regions shown in Fig. 5).

  15. Pseudoknot SHAPE-MaP profiles for ENVPK and CAPK.
    Supplementary Fig. 10: Pseudoknot SHAPE-MaP profiles for ENVPK and CAPK.

    (upper panel) SHAPE-MaP and structure profiles for ENVPK and direct growth competition and viral spread data. (lower panel) SHAPE-MaP and structure profiles for CAPK, located in a high entropy region of the RNA genome and thus served as a negative control. Also displayed are competition and viral spread assay data.

  16. Detection of 2'-O-adducts by mutational profiling.
    Supplementary Fig. 11: Detection of 2'-O-adducts by mutational profiling.

    Shown are rates for sequence changes and unambiguously aligned deletions, above background for the E. coli 16S rRNA. Nucleotides were defined as non-paired or paired based on the accepted secondary structure. The letter in the lower right of each panel indicates the expected nucleotide based on the coding strand, and the letters on the vertical axes indicate the nucleotide detected by sequencing or “del” for deletion. Rates are shown for the (a) 1M7, (b) 1M6, and (c) NMIA reagents. Nucleotide misincorporation and deletion rates were similar for the three SHAPE reagents.

  17. Primer design for SHAPE-MaP.
    Supplementary Fig. 12: Primer design for SHAPE-MaP.

    Sequences with low or unevenly distributed GC-content benefit from the newly designed LNA-based primers used to analyze HIV-1 sequences in this work.

Accession codes

Primary accessions

Sequence Read Archive

References

  1. Sharp, P.A. The centrality of RNA. Cell 136, 577580 (2009).
  2. Dethoff, E.A., Chugh, J., Mustoe, A.M. & Al-Hashimi, H.M. Functional complexity and regulation through RNA dynamics. Nature 482, 322330 (2012).
  3. Weeks, K.M. Advances in RNA structure analysis by chemical probing. Curr. Opin. Struct. Biol. 20, 295304 (2010).
  4. Mathews, D.H. et al. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl. Acad. Sci. USA 101, 72877292 (2004).
  5. Kertesz, M. et al. Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103107 (2010).
  6. Mauger, D.M. & Weeks, K.M. Toward global RNA structure analysis. Nat. Biotechnol. 28, 11781179 (2010).
  7. Underwood, J.G. et al. FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing. Nat. Methods 7, 9951001 (2010).
  8. Lucks, J.B. et al. Multiplexed RNA structure characterization with selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc. Natl. Acad. Sci. USA 108, 1106311068 (2011).
  9. Weeks, K.M. RNA structure probing dash seq. Proc. Natl. Acad. Sci. USA 108, 1093310934 (2011).
  10. Ding, Y. et al. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696700 (2014).
  11. Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J.S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701705 (2014).
  12. Grohman, J.K. et al. A guanosine-centric mechanism for RNA chaperone function. Science 340, 190195 (2013).
  13. Wilkinson, K.A. et al. High-throughput SHAPE analysis reveals structures in HIV-1 genomic RNA strongly conserved across distinct biological states. PLoS Biol. 6, e96 (2008).
  14. Gherghe, C. et al. Definition of a high-affinity Gag recognition structure mediating packaging of a retroviral RNA genome. Proc. Natl. Acad. Sci. USA 107, 1924819253 (2010).
  15. Tyrrell, J., McGinnis, J.L., Weeks, K.M. & Pielak, G.J. The cellular environment stabilizes adenine riboswitch RNA structure. Biochemistry 52, 87778785 (2013).
  16. McGinnis, J.L. & Weeks, K.M. Ribosome RNA assembly intermediates visualized in living cells. Biochemistry 53, 32373247 (2014).
  17. Mortimer, S.A. & Weeks, K.M. A fast-acting reagent for accurate analysis of RNA secondary and tertiary structure by SHAPE chemistry. J. Am. Chem. Soc. 129, 41444145 (2007).
  18. Weeks, K.M. & Mauger, D.M. Exploring RNA structural codes with SHAPE chemistry. Acc. Chem. Res. 44, 12801291 (2011).
  19. Rice, G.M., Leonard, C.W. & Weeks, K.M. RNA secondary structure modeling at consistent high accuracy using differential SHAPE. RNA 20, 846854 (2014).
  20. Merino, E.J., Wilkinson, K.A., Coughlan, J.L. & Weeks, K.M. RNA structure analysis at single nucleotide resolution by selective 2′-hydroxyl acylation and primer extension (SHAPE). J. Am. Chem. Soc. 127, 42234231 (2005).
  21. Spitale, R.C. et al. RNA SHAPE analysis in living cells. Nat. Chem. Biol. 9, 1820 (2013).
  22. Hajdin, C.E. et al. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc. Natl. Acad. Sci. USA 110, 54985503 (2013).
  23. Steen, K.-A., Rice, G.M. & Weeks, K.M. Fingerprinting noncanonical and tertiary RNA structures by differential SHAPE reactivity. J. Am. Chem. Soc. 134, 1316013163 (2012).
  24. Doty, P., Boedtker, H., Fresco, J.R., Haselkorn, R. & Litt, M. Secondary structure in ribonucleic acids. Proc. Natl. Acad. Sci. USA 45, 482499 (1959).
  25. Huynen, M., Gutell, R. & Konings, D. Assessing the reliability of RNA folding using statistical mechanics. J. Mol. Biol. 267, 11041112 (1997).
  26. Mathews, D.H. Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 10, 11781190 (2004).
  27. Watts, J.M. et al. Architecture and secondary structure of an entire HIV-1 RNA genome. Nature 460, 711716 (2009).
  28. Staple, D.W. & Butcher, S.E. Pseudoknots: RNA structures with diverse functions. PLoS Biol. 3, e213 (2005).
  29. Brierley, I., Pennell, S. & Gilbert, R.J.C. Viral RNA pseudoknots: versatile motifs in gene expression and replication. Nat. Rev. Microbiol. 5, 598610 (2007).
  30. Paillart, J.-C., Skripkin, E., Ehresmann, B., Ehresmann, C. & Marquet, R. In vitro evidence for a long range pseudoknot in the 5′-untranslated and matrix coding regions of HIV-1 genomic RNA. J. Biol. Chem. 277, 59956004 (2002).
  31. Resch, W., Ziermann, R., Parkin, N., Gamarnik, A. & Swanstrom, R. Nelfinavir-resistant, amprenavir-hypersusceptible strains of human immunodeficiency virus type 1 carrying an N88S mutation in protease have reduced infectivity, reduced replication capacity, and reduced fitness and process the Gag polyprotein precursor aberrantly. J. Virol. 76, 86598666 (2002).
  32. Matoulkova, E., Michalova, E., Vojtesek, B. & Hrstka, R. The role of the 3′ untranslated region in post-transcriptional regulation of protein expression in mammalian cells. RNA Biol. 9, 563576 (2012).
  33. Gilmartin, G.M., Fleming, E.S. & Oetjen, J. Activation of HIV-1 pre-mRNA 3′ processing in vitro requires both an upstream element and TAR. EMBO J. 11, 44194428 (1992).
  34. Klasens, B.I., Thiesen, M., Virtanen, A. & Berkhout, B. The ability of the HIV-1 AAUAAA signal to bind polyadenylation factors is controlled by local RNA structure. Nucleic Acids Res. 27, 446454 (1999).
  35. Ghadessy, F.J. & Holliger, P. Compartmentalized self-replication: a novel method for the directed evolution of polymerases and other enzymes. Methods Mol. Biol. 352, 237248 (2007).
  36. Chen, T. & Romesberg, F.E. Directed polymerase evolution. FEBS Lett. 588, 219229 (2014).
  37. Pollom, E. et al. Comparison of SIV and HIV-1 genomic RNA structures reveals impact of sequence evolution on conserved and non-conserved structural motifs. PLoS Pathog. 9, e1003294 (2013).
  38. Deigan, K.E., Li, T.W., Mathews, D.H. & Weeks, K.M. Accurate SHAPE-directed RNA structure determination. Proc. Natl. Acad. Sci. USA 106, 97102 (2009).
  39. Lorsch, J.R., Bartel, D.P. & Szostak, J.W. Reverse transcriptase reads through a 2′–5′ linkage and a 2′-thiphosphate in a template. Nucleic Acids Res. 23, 28112814 (1995).
  40. Patterson, J.T., Nickens, D.G. & Burke, D.H. HIV-1 reverse transcriptase pausing at bulky 2′ adducts is relieved by deletion of the RNase H domain. RNA Biol. 3, 163 (2006).
  41. Roth, M.J., Tanese, N. & Goff, S.P. Purification and characterization of murine retroviral reverse transcriptase expressed in Escherichia coli. J. Biol. Chem. 260, 93269335 (1985).
  42. Beckman, R.A., Mildvan, A.S. & Loeb, L.A. On the fidelity of DNA replication: manganese mutagenesis in vitro. Biochemistry 24, 58105817 (1985).
  43. Wilkinson, K.A., Merino, E.J. & Weeks, K.M. Selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nat. Protoc. 1, 16101616 (2006).
  44. Williams, R. et al. Amplification of complex gene libraries by emulsion PCR. Nat. Methods 3, 545550 (2006).
  45. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357359 (2012).
  46. Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 9095 (2007).
  47. Reuter, J.S. & Mathews, D.H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11, 129 (2010).
  48. Byun, Y. & Han, K. PseudoViewer: web application and web service for visualizing RNA pseudoknots and secondary structures. Nucleic Acids Res. 34, W416W422 (2006).
  49. Zhang, J., Chung, T. & Oldenburg, K. A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J. Biomol. Screen. 4, 6773 (1999).
  50. Wan, Y. et al. Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706709 (2014).
  51. Talkish, J., May, G., Lin, Y., Woolford, J.L. & McManus, C.J. Mod-seq: high-throughput sequencing for chemical probing of RNA structure. RNA 20, 713720 (2014).
  52. Adachi, A. et al. Production of acquired immunodeficiency syndrome-associated retrovirus in human and nonhuman cells transfected with an infectious molecular clone. J. Virol. 59, 284291 (1986).
  53. Derdeyn, C.A. et al. Sensitivity of human immunodeficiency virus type 1 to the fusion inhibitor T-20 is modulated by coreceptor specificity defined by the V3 loop of gp120. J. Virol. 74, 83588367 (2000).
  54. Harrich, D. et al. Role of SP1-binding domains in in vivo transcriptional regulation of the human immunodeficiency virus type 1 long terminal repeat. J. Virol. 63, 25852591 (1989).
  55. Jabara, C.B., Jones, C.D., Roach, J., Anderson, J.A. & Swanstrom, R. Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc. Natl. Acad. Sci. USA 108, 2016620171 (2011).
  56. Magoč, T. & Salzberg, S.L. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 29572963 (2011).

Download references

Author information

  1. These authors contributed equally to this work.

    • Nathan A Siegfried,
    • Steven Busan &
    • Greggory M Rice

Affiliations

  1. Department of Chemistry, University of North Carolina, Chapel Hill, North Carolina, USA.

    • Nathan A Siegfried,
    • Steven Busan,
    • Greggory M Rice &
    • Kevin M Weeks
  2. Department of Microbiology and Immunology, University of North Carolina, Chapel Hill, North Carolina, USA.

    • Julie A E Nelson

Contributions

The SHAPE-MaP strategy was conceived and designed by N.A.S. and K.M.W. SHAPE experiments were performed by N.A.S. and G.M.R. HIV-1 replication assays were designed and performed by N.A.S. and J.A.E.N. The SHAPE-MaP data analysis pipeline was created by S.B. RNA folding and motif discovery analyses were conceived and created by G.M.R. and K.M.W. All authors collaborated in interpreting the experiments and writing the manuscript.

Competing financial interests

N.A.S., S.B. and K.M.W. are listed as inventors on a US provisional patent application based on elements of this work.

Corresponding author

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: SHAPE-MaP data analysis pipeline. (159 KB)

    Outline of software pipeline that fully automates calculations of per-nucleotide mutation rates, SHAPE reactivities, and standard error estimates given high-throughput sequencing data and at least one reference sequence. The software is executable on Unix-based platforms. See Online Methods for full description. This strategy is implemented in the ShapeMapper software.

  2. Supplementary Figure 2: Overview of multiscale RNA secondary structure modeling. (213 KB)

    Genome-scale RNA secondary structure modeling was performed in steps to increase computational efficiency and model accuracy and to facilitate incorporation of pseudoknot prediction into a global model. (top) The first step involved searching for pseudoknots in short windows. For all later steps, pseudoknot pairs were prohibited from forming base pairs. In the second step, the partition function was calculated in overlapping windows and averaged for Shannon entropy and pairing probability evaluations (see Online Methods). Base pairs with probabilities ≥99% were forced to form during calculation of a minimum free energy structure using Fold47, using overlapping 4000-nt windows. In the third step, a consensus structure from the overlapping windows was generated by retaining base pairs that appeared in more than half of possible windows. Finally, pseudoknotted helices were added to the final model. (bottom) Comparison of windowed folding versus one-step folding for calculating the partition functions and minimum free energy structures for RNAs of 1500 to 9200 nts. Wall-time for modeling the entire HIV-1 was estimated (asterisk). A small performance penalty is observed for splitting an RNA into overlapping windows. However, computation time for RNAs over 3000 nts will scale approximately linearly with sequence length. Folding times are reported both as wall clock times and as cpu cycles (2012 iMac with 3.1 GHz Intel Core i7 and 16 GB RAM). This strategy is implemented in the SHAPE-MaP Folding Pipeline.

  3. Supplementary Figure 3: Strategies for the SHAPE-MaP experiment using either gene-specific primers or random fragmentation for sample analysis and sequencing library preparation. (236 KB)

    SHAPE-MaP can be performed using gene-specific primers (for small RNAs or targeted areas in large RNAs and for analysis of scarce and low concentration RNAs) or random primers (for comprehensive analysis of large RNAs or complete transcriptomes) to create the initial cDNA pool. For both approaches, RNA is treated with a SHAPE reagent or with solvent under conditions of interest, and a sample of RNA is modified under denaturing conditions. For gene-specific samples, reverse transcription and PCR primers are designed based on the known target sequence. Large RNAs are randomly fragmented in a buffered Mg2+ solution. Single-stranded cDNA was synthesized using mutation-prone reverse transcription; misincorporation events in the nascent cDNA mark the location of SHAPE adducts in the subject RNA. Double-stranded cDNAs were created either by PCR (gene-specific approach) or second-strand synthesis (randomly fragmented samples). Sequence platform-specific sequences (including multiplexing barcodes) were added to the dsDNA libraries, either directly through a second PCR (gene-specific approach) or by a DNA-DNA ligation of adaptor sequences (random fragmented samples). Libraries prepared by either method were then sequenced, producing data that were processed into SHAPE reactivity profiles used in structure modeling applications. SHAPE-MaP is fully independent of sequencing platform and library generation scheme (once the initial cDNA has been synthesized). Thus, any platform and any library generation scheme can be used.

  4. Supplementary Figure 4: Mutation rate histograms for paired and non-paired nucleotides in the 16S rRNA. (173 KB)

    Nucleotides were separated into paired (upper panels) and non-paired (lower panels) groups based on their observed pairing in the E. coli 16S rRNA38. Mutation rate histograms for each experimental sample (SHAPE, untreated, and denatured) were calculated based on pairing status (left-hand panels). Distributions of mutation rates for the SHAPE-modified and untreated samples are similar for base-paired nucleotides; whereas nucleotides in non-paired conformations are much more reactive towards SHAPE probing. (right-hand panels) SHAPE-MaP reactivities are independent of nucleotide type.

  5. Supplementary Figure 5: SHAPE-MaP replicates of E. coli 16S rRNA. (419 KB)

    Data correspond to full biological replicates performed six months apart by different individuals. The inset for nucleotides 1350-1450 (bottom right) shows standard errors.

  6. Supplementary Figure 6: Error analysis for SHAPE-MaP. (122 KB)

    Deep bootstrapping of highly sequenced TPP riboswitch samples (see Fig. 2). Individual sequencing reads from a large pool (150,000) were sampled with replacement 100 times per simulated depth. The standard error of the SHAPE reactivity was calculated at each depth from each bootstrap. Consistent with a Poisson model, the standard error of the SHAPE measurement decreased as the -1/2 power of read depth across all nucleotides.

  7. Supplementary Figure 7: Secondary structure models for regions of HIV-1NL4-3, identified de novo, with low SHAPE reactivities and low Shannon entropies. (332 KB)

    Nucleotides are colored by SHAPE reactivity. Structures are the same as shown in Fig. 4c, except that nucleotide identities are shown explicitly.

  8. Supplementary Figure 8: Pseudoknot mutants. (227 KB)

    Green arrows indicate sites of disruptive mutations.

  9. Supplementary Figure 9: Deconvolution of profiles for two alleles of the U3PK in a single SHAPE-MaP experiment. (208 KB)

    RNAs with nearly identical sequences can be computationally separated and analyzed using data generated from a single experiment (see Online Methods). Yellow bars indicate significant SHAPE reactivity differences (and highlight the same regions shown in Fig. 5).

  10. Supplementary Figure 10: Pseudoknot SHAPE-MaP profiles for ENVPK and CAPK. (286 KB)

    (upper panel) SHAPE-MaP and structure profiles for ENVPK and direct growth competition and viral spread data. (lower panel) SHAPE-MaP and structure profiles for CAPK, located in a high entropy region of the RNA genome and thus served as a negative control. Also displayed are competition and viral spread assay data.

  11. Supplementary Figure 11: Detection of 2'-O-adducts by mutational profiling. (258 KB)

    Shown are rates for sequence changes and unambiguously aligned deletions, above background for the E. coli 16S rRNA. Nucleotides were defined as non-paired or paired based on the accepted secondary structure. The letter in the lower right of each panel indicates the expected nucleotide based on the coding strand, and the letters on the vertical axes indicate the nucleotide detected by sequencing or “del” for deletion. Rates are shown for the (a) 1M7, (b) 1M6, and (c) NMIA reagents. Nucleotide misincorporation and deletion rates were similar for the three SHAPE reagents.

  12. Supplementary Figure 12: Primer design for SHAPE-MaP. (265 KB)

    Sequences with low or unevenly distributed GC-content benefit from the newly designed LNA-based primers used to analyze HIV-1 sequences in this work.

PDF files

  1. Supplementary Text and Figures (6,053 KB)

    Supplementary Figures 1–12 and Supplementary Tables 1–3

Excel files

  1. Supplementary Data 1 (770 KB)

    Full SHAPE dataset for the HIV-1 RNA genome.

Zip files

  1. Supplementary Data 2 (47 KB)

    Structure models for each well-defined region in the HIV-1 RNA genome, in connect-table format.

  2. Supplementary Data 4 (239 KB)

    Complete differential SHAPE-Map data for the model RNAs reported in Figure 3.

  3. Supplementary Software 1 (79 KB)

    Software pipeline for analyzing MaP data. Illustrated in Supplementary Figure 1.

  4. Supplementary Software 2 (210 KB)

    Pipeline for automated windowed modeling approach for folding long RNAs and calculating pairing probabilities and Shannon entropies. Illustrated in Supplementary Figure 2.

Text files

  1. Supplementary Data 3 (195 KB)

    Pairing probabilities for HIV-1 nucleotides, in tab-delimited text format.

Additional data