Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Determination of RNA structural diversity and its role in HIV-1 RNA splicing

An Author Correction to this article was published on 20 November 2020

This article has been updated


Human immunodeficiency virus 1 (HIV-1) is a retrovirus with a ten-kilobase single-stranded RNA genome. HIV-1 must express all of its gene products from a single primary transcript, which undergoes alternative splicing to produce diverse protein products that include structural proteins and regulatory factors1,2. Despite the critical role of alternative splicing, the mechanisms that drive the choice of splice site are poorly understood. Synonymous RNA mutations that lead to severe defects in splicing and viral replication indicate the presence of unknown cis-regulatory elements3. Here we use dimethyl sulfate mutational profiling with sequencing (DMS-MaPseq) to investigate the structure of HIV-1 RNA in cells, and develop an algorithm that we name ‘detection of RNA folding ensembles using expectation–maximization’ (DREEM), which reveals the alternative conformations that are assumed by the same RNA sequence. Contrary to previous models that have analysed population averages4, our results reveal heterogeneous regions of RNA structure across the entire HIV-1 genome. In addition to confirming that in vitro characterized5 alternative structures for the HIV-1 Rev responsive element also exist in cells, we discover alternative conformations at critical splice sites that influence the ratio of transcript isoforms. Our simultaneous measurement of splicing and intracellular RNA structure provides evidence for the long-standing hypothesis6,7,8 that heterogeneity in RNA conformation regulates splice-site use and viral gene expression.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Development and validation of DREEM algorithm for analysis of alternative RNA structures.
Fig. 2: The formation of alternative structures at HIV-1 RRE is driven by intrinsic RNA thermodynamics.
Fig. 3: Alternative RNA structures at the A3 splice acceptor site regulate splice site use.
Fig. 4: Landscape of heterogeneity in HIV-1 RNA.

Similar content being viewed by others

Data availability

Sequencing data can be obtained from the Gene Expression Omnibus (GEO) database using accession number GSE131506. All other data are available from the corresponding author upon reasonable request.

Code availability

The following programs were used. For sequence alignment, Bowtie2 For code development, Python v. 3.6.7. For read trimming, TrimGalore 0.4.1. For read quality assessment, FastQC v.0.11.8. For RNA secondary structure analysis, RNAstructure v.6.0.1. For calculating post-mapping statistics, Picard 2.18.7. For visualization of RNA secondary structure, VARNA v.3.93. For HIV-1 splicing analysis, For generating splice plots, R version 3.5.1. For figure construction, Adobe Illustrator CC 2019. For data analysis, Microsoft Excel 2018. For plot generation, Plotly v.3.2.1. The DREEM clustering algorithm is available at

Change history


  1. Purcell, D. F. & Martin, M. A. Alternative splicing of human immunodeficiency virus type 1 mRNA modulates viral protein expression, replication, and infectivity. J. Virol. 67, 6365–6378 (1993).

    Article  CAS  Google Scholar 

  2. Ocwieja, K. E. et al. Dynamic regulation of HIV-1 mRNA populations analyzed by single-molecule enrichment and long-read sequencing. Nucleic Acids Res. 40, 10345–10355 (2012).

    Article  CAS  Google Scholar 

  3. Takata, M. A. et al. Global synonymous mutagenesis identifies cis-acting RNA elements that regulate HIV-1 splicing and replication. PLoS Pathog. 14, e1006824 (2018).

    Article  Google Scholar 

  4. Watts, J. M. et al. Architecture and secondary structure of an entire HIV-1 RNA genome. Nature 460, 711–716 (2009).

    Article  ADS  CAS  Google Scholar 

  5. Sherpa, C., Rausch, J. W., Le Grice, S. F., Hammarskjold, M. L. & Rekosh, D. The HIV-1 Rev response element (RRE) adopts alternative conformations that promote different rates of virus replication. Nucleic Acids Res. 43, 4676–4686 (2015).

    Article  CAS  Google Scholar 

  6. Warf, M. B. & Berglund, J. A. Role of RNA structure in regulating pre-mRNA splicing. Trends Biochem. Sci. 35, 169–178 (2010).

    Article  CAS  Google Scholar 

  7. Shepard, P. J. & Hertel, K. J. Conserved RNA secondary structures promote alternative splicing. RNA 14, 1463–1469 (2008).

    Article  CAS  Google Scholar 

  8. Singh, N. N., Lee, B. M. & Singh, R. N. Splicing regulation in spinal muscular atrophy by an RNA structure formed by long-distance interactions. Ann. NY Acad. Sci. 1341, 176–187 (2015).

    Article  ADS  CAS  Google Scholar 

  9. Huthoff, H. & Berkhout, B. Two alternating structures of the HIV-1 leader RNA. RNA 7, 143–157 (2001).

    Article  CAS  Google Scholar 

  10. Abbink, T. E., Ooms, M., Haasnoot, P. C. & Berkhout, B. The HIV-1 leader RNA conformational switch regulates RNA dimerization but does not regulate mRNA translation. Biochemistry 44, 9058–9066 (2005).

    Article  CAS  Google Scholar 

  11. Zubradt, M. et al. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat. Methods 14, 75–82 (2017).

    Article  CAS  Google Scholar 

  12. Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).

  13. Reuter, J. S. & Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11, 129 (2010).

    Article  Google Scholar 

  14. Spasic, A., Assmann, S. M., Bevilacqua, P. C. & Mathews, D. H. Modeling RNA secondary structure folding ensembles using SHAPE mapping data. Nucleic Acids Res. 46, 314–323 (2018).

    Article  CAS  Google Scholar 

  15. Homan, P. J. et al. Single-molecule correlated chemical probing of RNA. Proc. Natl Acad. Sci. USA 111, 13858–13863 (2014).

    Article  ADS  CAS  Google Scholar 

  16. Sengupta, A., Rice, G. M. & Weeks, K. M. Single-molecule correlated chemical probing reveals large-scale structural communication in the ribosome and the mechanism of the antibiotic spectinomycin in living cells. PLoS Biol. 17, e3000393 (2019).

    Article  CAS  Google Scholar 

  17. Ding, Y. & Lawrence, C. E. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 31, 7280–7301 (2003).

    Article  CAS  Google Scholar 

  18. Halvorsen, M., Martin, J. S., Broadaway, S. & Laederach, A. Disease-associated mutations that alter the RNA structural ensemble. PLoS Genet. 6, e1001074 (2010).

    Article  Google Scholar 

  19. Wan, Y. et al. Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706–709 (2014).

    Article  ADS  CAS  Google Scholar 

  20. Tian, S., Kladwang, W. & Das, R. Allosteric mechanism of the V. vulnificus adenine riboswitch resolved by four-dimensional chemical mapping. eLife 7, e29602 (2018).

    Article  Google Scholar 

  21. Lemay, J. F. et al. Comparative study between transcriptionally- and translationally-acting adenine riboswitches reveals key differences in riboswitch regulatory mechanisms. PLoS Genet. 7, e1001278 (2011).

    Article  CAS  Google Scholar 

  22. Zaug, A. J. & Cech, T. R. Analysis of the structure of Tetrahymena nuclear RNAs in vivo: telomerase RNA, the self-splicing rRNA intron, and U2 snRNA. RNA 1, 363–374 (1995).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Emery, A., Zhou, S., Pollom, E. & Swanstrom, R. Characterizing HIV-1 splicing by using next-generation sequencing. J. Virol. 91, e02515-16 (2017).

    Article  CAS  Google Scholar 

  24. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).

    Article  MathSciNet  Google Scholar 

  25. Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2014).

    Article  ADS  CAS  Google Scholar 

  26. Liu, Y. et al. The roles of five conserved lentiviral RNA structures in HIV-1 replication. Virology 514, 1–8 (2018).

    Article  CAS  Google Scholar 

  27. Kondo, Y., Oubridge, C., van Roon, A. M. & Nagai, K. Crystal structure of human U1 snRNP, a small nuclear ribonucleoprotein particle, reveals the mechanism of 5′ splice site recognition. eLife 4, e04986 (2015).

  28. Cornilescu, G. et al. Structural analysis of multi-helical RNAs by NMR-SAXS/WAXS: application to the U4/U6 di-snRNA. J. Mol. Biol. 428 (5 Pt A), 777–789 (2016).

    Article  CAS  Google Scholar 

  29. Xiong, H. Y. et al. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015).

    Article  Google Scholar 

  30. Faustino, N. A. & Cooper, T. A. Pre-mRNA splicing and human disease. Genes Dev. 17, 419–437 (2003).

    Article  CAS  Google Scholar 

  31. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  Google Scholar 

  32. Darty, K., Denise, A. & Ponty, Y. VARNA: interactive drawing and editing of the RNA secondary structure. Bioinformatics 25, 1974–1975 (2009).

    Article  CAS  Google Scholar 

  33. Adachi, A. et al. Production of acquired immunodeficiency syndrome-associated retrovirus in human and nonhuman cells transfected with an infectious molecular clone. J. Virol. 59, 284–291 (1986).

    Article  CAS  Google Scholar 

  34. Lahm, H. W. & Stein, S. Characterization of recombinant human interleukin-2 with micromethods. J. Chromatogr. A 326, 357–361 (1985).

    Article  CAS  Google Scholar 

Download references


The following reagents were obtained through the NIH AIDS Reagent Program, Division of AIDS, NIAID, NIH: human recombinant IL-2 from M. Gately and HIV-1NL4-3 infectious molecular clone (pNL4-3) from M. Martin (cat. no. 114). This work was supported in part by the NIH (R21AI134365), the Center of HIV-1 RNA Studies (CRNA) NIH U54AI50470, the Smith Family Foundation and the Burroughs Wellcome fund.

Author information

Authors and Affiliations



V.D.A.C., H.S., M.G., S.P., M.D.E., L.M., A.T.P. and S.R. developed and wrote the DREEM clustering algorithm and analysed validation studies. P.J.T. performed all cell and virus RNA modification assays. P.G. performed all in vitro RNA modification assays. P.J.T., P.G. and S.R. analysed HIV-1 RRE and A3 RNA structure data. P.J.T., S.R., T.Z. and P.B. designed mutants. T.Z. produced mutant plasmids. A.E. and R.S. performed splicing analysis assays. P.J.T. generated the genome-wide DMS-MaPseq library. P.J.T., H.S., P.G. and S.R. analysed genome-wide library data. T.C.T.L. conducted the U4 and U6 experiment. P.J.T. and S.R. wrote the manuscript. P.J.T., P.G. and S.R. created the figures. P.J.T., V.D.A.C., P.G., H.S., A.T.P., R.S., P.B., D.R.K., A.T. and S.R. edited the manuscript and figures.

Corresponding author

Correspondence to Silvi Rouskin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Alan Frankel, Daniel Herschlag, Alain Laederach and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 DREEM clustering pipeline for DMS-MaPseq data.

a, A read x″ is represented as a series of D bits, in which D is the length of the read. A base is denoted by the bit 1 if it is mutated away from the reference, and by 0 otherwise. K is the number of clusters in the sample. μk = {μk1μki, …, μkD} is the mutation profile of cluster k, and πk is the mixing proportion of cluster k such that \({\sum }_{k=1}^{K}{\pi }_{k}=1\) for k = 1 to K. The model parameters μ and π are randomly initialized. In the expectation step, reads are assigned probabilistically to clusters and the likelihood of observing the data given the model parameters is computed. In the maximization step, the mixing proportion is calculated from the read assignments and the mutation profiles are updated for each cluster to maximize the expectation value of the complete case likelihood. The expectation steps alternate with the maximization steps until the likelihood converges. The likelihood function is derived using Bernoulli mixture models modified to account for missing data in the form of the underrepresentation of reads with adjacent mutations. b, Mutational distance distribution between bases in denatured DMS-modified total RNA. The mutation distance versus frequency is plotted, between two DMS-reactive positions (that is, A or C to A or C; shown as yellow bars) and between one DMS-reactive position and a background mutation (for example, mutation owing to sequencing error) (that is, A or C to T or G; shown as blue bars). The blue bars demonstrate the frequency of observing two mutations due to background.

Extended Data Fig. 2 DREEM clustering identifies and quantifies individual structures from in vitro mixing experiments.

Structure 1 and structure 2 sequences were in vitro-transcribed and refolded, mixed in different proportions and probed with DMS-MaPseq. The region used for DREEM clustering covers nucleotides 21–135 (labelled as 1–115 on the figure), which excludes the primers used for RT–PCR (that have no DMS-induced mutations) and is identical in sequence for the two structures, except for the A > C mutation at position 94. Position 94 is masked during analysis. The topmost panel shows the DMS reactivity pattern of structure 1 by itself and structure 2 by itself. The rest of the panels show the clustering results at specified mixing ratios (n = 1).

Extended Data Fig. 3 Secondary structure models for the V. vulnificus add riboswitch.

a, Percentages for each cluster detected in the presence or absence of 5 mM adenine to the add riboswitch. b, In vitro structure models obtained from probing add using DMS-MaPseq followed by DREEM, colour-coded by normalized DMS signal. The ApoB and ApoB alternative structures represent the off state, which is incompetent for ligand binding. ApoA represents the on state. Previously identified helices are boxed and labelled.

Extended Data Fig. 4 DREEM clustering reveals an equilibrium of four-stem and five-stem structures for the in vitro-folded HIV-1 RRE.

a, Population-average DMS-MaPseq data for in vitro-transcribed, refolded and DMS-treated (or untreated) samples. b, Scatter plots showing the reproducibility of the DMS signal from the DREEM clustering results between two replicates with different DMS modification conditions. Replicate 1 was modified in 0.25% DMS and replicate 2 was modified with 2.5% DMS. R2 is Pearson’s R2. c, DREEM clustering data from b were used as constraints to generate RNA structure models. The models derived for clusters 1 and 2 from replicate 1 are shown, colour-coded by normalized DMS signal.

Extended Data Fig. 5 The HIV-1 RRE forms two stable alternative structures in CD4+ T cells.

a, Schematic of DMS treatment in primary cells and isolated virions. b, DMS-MaPseq probing of the intracellular HIV-1NL4-3 RRE in CD4+ T cells was used as input for DREEM clustering. Two clusters passed the BIC test and were used as constraints on the folding using RNAstructure. Structural models are colour-coded by normalized DMS reactivity; bases not covered by the region of PCR are coloured in grey. Data used to construct models are representative data from n = 2 biologically independent experiments.

Extended Data Fig. 6 The A3 splice site forms alternative structures in vitro.

A 472-nt A3 sequence from the HIV-1NHG strain was in vitro-transcribed, refolded and probed with DMS-MaPseq. Models based on DREEM clustering for the local structures that form at the A3 site are shown, colour-coded by the normalized DMS signal. Percentages of clusters 1 and 2 come from n = 1 experiment, as determined by DREEM.

Extended Data Fig. 7 Splice site use in additional A3 mutants.

a, Structure models illustrating the mutant design for A3SL mut4 and A3SL mut 5. b, Splice site use for A3SL mut 4 and A3SL mut 5 for splice sites A1–A5, reported as fold change compared to Δvpr HIV-1NHG. Central bar represents the mean, and error bars indicate s.d. n = 4 biologically independent experiments. c, Average fraction of transcripts using the A3 site, compared to the percentage of cluster 1 (A3SL), as determined by DREEM (n = 1) for A3SL mut 1–5. Mutants are colour-coded. A dot indicates a multiply spliced (MS) HIV-1 transcript, and a triangle indicates a singly splice (SS) HIV-1 transcript.

Extended Data Fig. 8 Structural models of A3SL mut 1 and A3SL mut 4.

a, Structural models for A3SL mut 1 derived from n = 1 experiment, after DREEM clustering. Pink box, region of mutations; blue box, splice site. Exonic splicing enhancer (ESE) and exonic splicing silencer (ESS) binding sites are shown. b, Structural models made using DMS-MaPseq data from HEK293T cells transfected with Δvpr HIV-1NHG A3SL mut 4. Dark blue box, sequence of the A3 splice site; pink box, location of the mutations. Splice enhancer and suppressor binding sites are highlighted; purple, ESS2p; blue, ESEtat; orange, ESE2; and green, ESS2. Percentages of each cluster come from n = 1 experiment.

Extended Data Fig. 9 Quality control for the generation of the genome-wide HIV-1NHG library.

a, Coverage of HIV-1 genome with DMS-MaPseq data from HEK293T cells transfected with HIV-1NHG. b, Moving average of A and C mutation frequency in 100-nt windows after DMS-MaPseq, compared to the moving average T and G mutation frequency. c, DMS-MaPseq data from HEK293T cells transfected with HIV-1NHG were used as input for DREEM. Local 80-nt window from Fig. 4 for the RRE region was used for clustering. Percentages of clusters 1 and 2 come from n = 1 experiment. Nucleotides are colour-coded on the basis of the normalized DMS signal; bases outside of the window used for clustering are coloured in grey. d, The A3 splice site was analysed using DMS-MaPseq and DREEM clustering from genome-wide data from HEK293T cells transfected with HIV-1NHG. Percentages of clusters 1 and 2 come from n = 1 experiment, as determined by DREEM. Nucleotides are colour-coded on the basis of the normalized DMS signal. e, A region of the HIV-1 genome in the pol coding region (nucleotides 2,000–2,120, based on HIV-1NHG genomic RNA coordinates) was analysed using DMS-MaPseq and DREEM clustering from genome-wide data from HEK293T cells transfected with HIV-1NHG. Two clusters passed the BIC test in adjacent 80-nt windows that overlapped by 40 nt. The two 80-nt windows were combined to make the structural models. The range of proportions of each cluster come from the individual windows of n = 1 experiment. Nucleotides were colour-coded on the basis of the normalized DMS signal.

Extended Data Fig. 10 Proportion of minor clusters across the HIV-1 genome and the U1, U4 and U6 core-domain structural models.

a, Each bar shows the proportion of a minor cluster of an 80-nt window as a function of genome position for regions in the HIV-1NHG genome dataset that are covered by at least 100,000 reads and pass 2 clusters, according to the BIC test. b, U1 structural prediction from HEK293T cells transfected with HIV-1NHG. The abundance of the cluster was obtained from DREEM clustering. c, In vitro DMS-modified U4 and U6 core-domain RNA. The structure is shown for a population average; cluster 2 did not pass the BIC test. d, Left, difference in BIC test value between K = 2 and K = 1, normalized to the value for K = 2 for the real whole-genome dataset. Each bar represents an 80-nt window across the HIV-1 genome. Orange, windows in which only one cluster was detected according to the BIC test; blue, clusters for which two clusters passed the BIC test. Right, the same plot as shown in the left panel from simulated data, for which the mutations were randomly distributed but had the same average number of mutations per read as the true data.

Extended Data Fig. 11 Shannon entropy across the HIV-1 genome, and A4 and A5 splice sites.

a, Overlay of the HIV-1NHG genomic organization on top of a Shannon entropy plot. Each dot represents an 80-nt window, in which Shannon entropy was calculated from DMS reactivity. The top plot is the major cluster and the bottom is the minor cluster. b, Scatter plot of Gini index versus Shannon entropy for the major and minor clusters (n = 1). R2 is Pearson’s R2. c, Structural model of the transcription-activation-region stem loop from the genome-wide DMS-MaPseq and DREEM data. d, Structural model from two clusters found using the genome-wide DMS-MaPseq and DREEM data for a window containing four splice acceptor sites (A4a, A4b, A4c and A5). Splice sites are boxed. Nucleotides are colour-coded on the basis of the normalized DMS signal.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tomezsko, P.J., Corbin, V.D.A., Gupta, P. et al. Determination of RNA structural diversity and its role in HIV-1 RNA splicing. Nature 582, 438–442 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing