Mutational interference mapping experiment (MIME) for studying RNA structure and function

Journal name:
Nature Methods
Volume:
12,
Pages:
866–872
Year published:
DOI:
doi:10.1038/nmeth.3490
Received
Accepted
Published online

Abstract

RNA regulates many biological processes; however, identifying functional RNA sequences and structures is complex and time-consuming. We introduce a method, mutational interference mapping experiment (MIME), to identify, at single-nucleotide resolution, the primary sequence and secondary structures of an RNA molecule that are crucial for its function. MIME is based on random mutagenesis of the RNA target followed by functional selection and next-generation sequencing. Our analytical approach allows the recovery of quantitative binding parameters and permits the identification of base-pairing partners directly from the sequencing data. We used this method to map the binding site of the human immunodeficiency virus-1 (HIV-1) Pr55Gag protein on the viral genomic RNA in vitro, and showed that, by analyzing permitted base-pairing patterns, we could model RNA structure motifs that are crucial for protein binding.

At a glance

Figures

  1. A model protein-RNA interaction and a schematic for MIME.
    Figure 1: A model protein-RNA interaction and a schematic for MIME.

    (a) The specific recognition of the 5′ region of the HIV-1 genomic RNA by the structural protein Pr55Gag as a model of protein-RNA interaction. The 5′ region of the HIV-1 genomic RNA folds into several independent structural domains. From 5′ to 3′, these are the transactivation response (TAR) element (which mediates efficient transcription), polyadenylation signal (PolyA), primer-binding site (PBS) domain and four closely spaced stem-loop structures termed SL1–SL4. SL1 promotes the dimerization of genomic RNA, SL2 contains the major splice donor (SD) site, SL3 has historically been considered the major packaging signal (Psi) and SL4 forms an unstable stem loop containing the gag AUG translation initiation codon. The HIV-1 structural protein Pr55Gag is comprised of matrix (MA), capsid (CA) and nucleocapsid (NC) domains, as well as the p6 terminal domain. (bd) MIME relies on the random introduction of mutations into an RNA target, the physical separation of functional from nonfunctional RNA and the quantification of RNA mutations in each population using next-generation sequencing. (b) Mutations (red circles) are introduced by error-prone PCR into the 5′ region of the HIV-1 genomic RNA (nucleotides 1–532). Inclusion of a T7 initiation site in the 5′ primer (red solid line) allows RNA (black dashed line) to be generated directly from the PCR product. (c) Functional RNA is selected from nonfunctional RNA using a His-tagged protein and magnetic beads. (d) RNA is reverse transcribed into DNA (black solid line) and randomly fragmented into 100- to 500-bp fragments. Illumina-specific adaptors (green solid line) and sample-specific barcodes to be used in multiplexing are added to the fragmented DNA, and the fragments are sequenced on an Illumina HiSeq 2000 instrument in paired-end 100 mode.

  2. Modeling the effects of mutations on Pr55Gag-RNA interaction.
    Figure 2: Modeling the effects of mutations on Pr55Gag-RNA interaction.

    (a) At low Pr55Gag concentrations, low-affinity mutant RNA (red circles) may not be bound in sufficient amounts, meaning that mutations that further impair binding may not be detected in the protein-bound fraction. Conversely, at high Pr55Gag concentrations, high-affinity mutant RNA (blue triangles) may be completely protein bound. (b) The ratio between detected mutation frequencies in the unbound and bound fractions for three different Pr55Gag/RNA ratios. (c) RNAs with distinct mutations compete for Pr55Gag binding according to the law of mass action. The relative binding affinity of mutated (indicated with stars) versus wild-type (no stars) RNA (Kdm(i)/Kdw(i)) can be computed from the ratio-of-ratios (bound versus unbound and wild-type versus mutant RNA, see equations (2)–(4)). (d) Statistical attributes of relative binding affinities Kdm(i)/Kdw(i) are obtained through a resampling procedure: the effect of mutation m at position i is recomputed for all positions j, where position j is wild-type w (see equation (6)). Since the number of sequence reads that span position i and j decreases as the distance between i and j increases, this is analogous to a jackknife resampling procedure. For robustness, we only calculated Kd values for pairs of positions that have at least 50% coverage relative to i. (e) The relative binding affinity of the mutations 'm(max)' that maximally affect binding at nucleotide position i, expressed as log2(Kdm(max),w(i,*)/Kdw,w(i,*)) are assessed, where * indicates that all eligible wild-type positions j are considered. Medians of the maximally affecting mutations are depicted as red lines. The 5th–95th percentiles are in light gray and the 25th–75th percentiles are in dark gray. (f) P values computed from the resampling procedure (equations (7) and (8)) after correction by the false-discovery rate method of Benjamini and Hochberg (BHFDR) show positions where Pr55Gag binding is significantly decreased upon mutation m(max) at position i in the viral RNA (P > 0.05, small black dots; P < 0.05, open circles; P < 0.01, closed circles; P < 0.001, large black dots). Data was obtained from the use of two independent libraries. A detailed explanation of the procedure is described in Supplementary Note 2.

  3. Single-variation analysis identifies RNA structure and sequence requirements for Pr55Gag binding.
    Figure 3: Single-variation analysis identifies RNA structure and sequence requirements for Pr55Gag binding.

    (a–c) Single-variation analysis can detect nucleotide positions involved in Watson-Crick and wobble base pairing, and this may distinguish between various proposed structural models of the HIV-1 RNA genome. Green and yellow circles denote A and C nucleotides in which structure-affecting mutations significantly impair Pr55Gag binding and structure-preserving mutations impair Pr55Gag binding to an extent markedly less than all (green) or one (yellow) of the other structure-affecting mutations. Blue circles denote G and U nucleotides where structure-affecting mutations significantly impair Pr55Gag binding and in which structure-preserving mutations preserve or improve Pr55Gag binding. Numerical results for the statistical tests can be found in Supplementary Data 3. (a) Model showing an extended SL1, a short SL2 helix and a long-range interaction between the CU-rich region and a region proximal to the AUG start codon14, 15. (b) Model with short SL1, long SL2, PBS1 lower stem and CU-rich hairpin17. (c) Model with short SL1, long SL2, PBS1′ lower stem and CU-rich loop16. (d) RNA helices are mainly formed by the A-U and G-C Watson-Crick base pairs and by the G-U wobble base pair. Certain mutations can maintain RNA structure by transforming Watson-Crick base pairs into wobble and vice versa. (e) Single-variation analysis gives information on likely pairing partners. WT, wild type.

  4. Identification of RNA structures important for Pr55Gag binding through covariation analysis.
    Figure 4: Identification of RNA structures important for Pr55Gag binding through covariation analysis.

    (a) If two residues at positions i and j interact to form a structural element important for Pr55Gag binding, certain single mutations at either position may impair binding, whereas certain combinations of double mutations can restore binding. These pairs of mutations should thus confer positive epistasis. (b–e) Epistasis and stem plots. Shading indicates strength of epistatic interaction or stem potential. (b) Epistasis plot showing all pairs of residues conferring positive epistasis (potentially interacting nucleotides, see equation (11)). (c) Stem plot showing all potential stems of lengths ≥3 bp. Stem scores are an accumulation of epistatic interaction scores across a diagonal. Stems are penalized if they include base pairs with no epistatic support (see equations (14) and (15)). Residues that were identified as being base-paired by single-variation analysis (compare to Fig. 3) are indicated by red and blue marks on the x and y axes. Conflicting stems are bordered by orange lines. (d) Zoomed-in view of the region with conflicting stem assignments. Residues that were identified as base-paired by single-variation analysis are superimposed as blue and red circles. (e) Stem plot after conflict resolution shows the stems with the highest support. The computational procedure is detailed in Supplementary Note 4.

  5. Mapping the mutational effects on binding affinity to the structure of the HIV genomic RNA.
    Figure 5: Mapping the mutational effects on binding affinity to the structure of the HIV genomic RNA.

    (a) Mapping of the effects of mutations (expressed as log2(Kdm(max)/Kdw)) to the structure of the HIV-1 genome. Positions improving Pr55Gag binding are shown in blue and positions impairing Pr55Gag are shown in red. Positions that do not pass the quality criteria (see Online Methods) are shown in gray. (b) Zoomed-in view of SL1. (c) Zoomed-in view of SL3.

  6. Raw mutation rate of DNA libraries
    Supplementary Fig. 1: Raw mutation rate of DNA libraries

    Two independent mutated libraries were generated by PCR mutagenesis. Sequencing of these mutant libraries showed that we had introduced a mutation rate (μ) of approximately 0.006 mutations per nucleotide, respectively. Coefficient of variation was 37% in both libraries. Wild-type DNA was sequenced to measure errors introduced during library preparation and sequencing. Wild-type DNA showed a mutation rate of 0.0013 mutations per nucleotide with a coefficient of variation of 117%.

  7. Comparison of relative Kd values obtained from experimental replicates
    Supplementary Fig. 2: Comparison of relative Kd values obtained from experimental replicates

    Left: Significant median Kdm(max)/Kdw values computed from technical replicate 1 (experiment 1) compared to replicate/experiment 2, showing the Pearson correlation between the quantitative Kd estimates and the corresponding P-value for assessing non-zero correlation. ‘Median’ denotes the median of the re-sampling distribution obtained after computational analysis (see eq. (6) in the Online Methods. Significance is determined using eqs. (7)-(8) in the Online Methods after multiple test correction. The vertical- and horizontal dashed lines separate Kdm(max)/Kdw estimates that are significantly larger- or smaller than 1 in the respective experiments and thus allow a qualitative assessment of the respective relative Kd estimates between experimental replicates. For example, in 1% of all depicted relative Kd estimates, replicate 2 yielded Kdm(max)/Kdw > 1 vs. Kdm(max)/Kdw < 1 in experimental replicate 1. The diagonal dashed line indicates the line of unity. Right: Cross-tabulation of the estimated number of positions significantly altering (increasing/decreasing) Pr55Gag binding. Congruent estimates, in terms of labeling the position as significantly vs. not significantly altering Pr55Gag binding, are found on the diagonal. Incongruent estimates are found on the anti-diagonal.

  8. Relative Kd values for each Pr55Gag concentration respectively vs. values obtained for the pooled (all concentrations) dataset
    Supplementary Fig. 3: Relative Kd values for each Pr55Gag concentration respectively vs. values obtained for the pooled (all concentrations) dataset

    Left panels: Significant median log2(Kdm(max)/Kdw) QUOTE   values of pooled data compared to (a) low Pr55Gag:RNA ratio (20nM Pr55Gag). (b) Equimolar Pr55Gag:RNA ratio (200nM Pr55Gag). (c) High Pr55Gag:RNA ratio (2000nM Pr55Gag), showing the Pearson correlation between the quantitative Kd estimates and the corresponding P-value for assessing non-zero correlation. ‘Median’ denotes the median of the re-sampling distribution obtained after computational analysis (see eq.(6) in the Online Methods). Significance is determined using eqs.(7)-(8) in the Online Methods after multiple test correction. Filled circles indicate Kdm(max)/Kdw estimates that are significantly smaller- or greater than 1 using either dataset, whereas red unfilled circles indicate Kdm(max)/Kdw estimates that are significantly smaller/greater 1 in the pooled dataset, but not in the dataset using the individual Pr55Gag concentration. Blue unfilled circles indicate Kdm(max)/Kdw estimates that are not significantly smaller/greater 1 in the pooled dataset, but which are estimated to be significantly altered when using the dataset with the individual Pr55Gag concentration. The vertical- and horizontal dashed lines separate Kdm(max)/Kdw estimates that are significantly larger- or smaller than 1 using the respective datasets and thus allow a qualitative assessment of the respective relative Kd estimates between datasets. The indicated percentages are computed on the bases of all Kdm(max)/Kdw estimates that are significantly smaller- or greater than 1 using either dataset (filled circles). Right panels: Cross-tabulation of the estimated number of positions significantly altering (increasing/decreasing) Pr55Gag binding. Congruent estimates, in terms of labeling the position as significantly vs. not significantly altering Pr55Gag binding, are found on the diagonal. Incongruent estimates are found on the anti-diagonal.

  9. Binding of Pr55Gag to the core-binding domain by filter binding assay
    Supplementary Fig. 4: Binding of Pr55Gag to the core-binding domain by filter binding assay

    (a) Schematic of RNA (b) Binding of RNA (NL 1-532) and RNA corresponding to Pr55Gag core binding domain (NL 227-377) analyzed by filter binding assay.

  10. Interaction between MS2 coat protein and MS2 stem loop in non-cognate RNA
    Supplementary Fig. 5: Interaction between MS2 coat protein and MS2 stem loop in non-cognate RNA

    (a) Mapping of the effect of mutations on relative binding affinity, depicted as Kdm(max)/Kdw QUOTE  , to a structural representation of the HIV-1 genome including the MS2 stem loop, which was inserted between the TAR and polyA hairpins. (b) Median effect of mutations on relative binding affinity, Kdm(max)/Kdw, in the HIV-1 genome containing the MS2 stem loop (red) and the HIV-1 genome without the MS2 stem loop (blue; negative control). Weak unspecific binding of the MS2 coat protein to the polyA and SL1 hairpins was detected irrespective of the presence of MS2 stem loop. (c) Effect of specific mutations on log2(Kdm/Kdw), for positions 52-92. MS2 stem loop spans positions 61-79. Box and whisker plots show effect of each class of mutation on relative binding affinity expressed as log2(Kdm/Kdw) where black and white circle shows median, box shows quartiles (25% and 75%) and whiskers show extremes (excluding outliers). Mutation classes are colour coded: red mutated to A; green mutated to C; blue mutated to G; yellow mutated to U. (d) Zoom on MS2 stem loop structure showing mutation with maximum effect on relative Kd, Kdm(max)/Kdw. Colour scale shows red with decreased binding affinity, blue increased binding affinity.

  11. Single variation analysis of Pr55Gag core binding domain
    Supplementary Fig. 6: Single variation analysis of Pr55Gag core binding domain

    Positions where certain classes of mutations have statistically different effects on Pr55Gag binding. Green circle: structure-affecting mutations significantly impair binding and structure-preserving mutations impair binding significantly less than other possibilities. Yellow circles: structure-affecting mutations significantly impair binding and structure-preserving mutations impairs binding significantly less than one of the other possible mutations. Blue circles: structure-modulating mutations significantly impair Pr55Gag binding and where structure-preserving mutations improve Pr55Gag binding. Grey Circles: other sites of interest. Box and whisker plots show effect of each class of mutation on relative binding affinity expressed as QUOTE   where black and white circle shows median, box shows quartiles (25% and 75%) and whiskers show extremes (excluding outliers). Mutation classes are colour coded: red mutated to A; green mutated to C; blue mutated to G; yellow mutated to U. Statistical tests are listed in Online Methods, P-values are listed in Supplementary Data 3.

  12. Binding of Pr55Gag to mutant RNA
    Supplementary Fig. 7: Binding of Pr55Gag to mutant RNA

    HIV-1 genomic RNA containing point mutations was tested by filter binding experiments. These mutations were selected to be representative of the MIME data, including positions predicted to be single stranded, double stranded, positions showing strong evidence of RNA structure, and several positions where MIME indicates that co-variation maintains Pr55Gag binding.

  13. Comparison of relative dissociation constants obtained from filter binding experiments vs. MIME
    Supplementary Fig. 8: Comparison of relative dissociation constants obtained from filter binding experiments vs. MIME

    Comparison of relative dissociation constants (expressed as log2(Kdm/Kdw)) for binding of Pr55Gag to HIV-1 genomic RNA, containing single point mutations, obtained from the filter binding experiments displayed in Supplementary Figure 7 vs. MIME. The left panel (a) compares the relative dissociation constants quantitatively. The black circles have the median log2(Kdm/Kdw) from the filter binding experiments vs. MIME as x- and y coordinates. The grey vertical bars show the quartiles of the MIME prediction, whereas the horizontal grey bars show the quartiles of the log2(Kdm/Kdw) estimate obtained from the filter binding experiment. The line of unity is indicated by a diagonal red dashed line and the vertical- and horizontal dashed black lined separate filter binding and MIME estimates that increase or decrease binding respectively. The right panel (b) shows a cross-tabulation comparing the qualitative outcome of the two assays. E.g. the upper left entry shows the number of Kdm/Kdw values that were estimated to be significantly different from 1 by MIME, but not by the filter binding experiment, whereas the upper right corner shows the number of Kdm/Kdw values that were estimated to be significantly different from 1 by both assays.

  14. Mapping of effects of mutations on binding affinity to the structure of the HIV genomic RNA
    Supplementary Fig. 9: Mapping of effects of mutations on binding affinity to the structure of the HIV genomic RNA

    Effect of mutations mapped to the structure of the HIV-1 genome proposed by Siegfried et al. Mutation with maximum effect on Kd expressed as log2(Kdm(max)/Kdw). Postitions in red significantly impair Pr55Gag binding when mutated. Positions in blue significantly improve Pr55Gag binding when mutated. Positions with no significant change are shown in grey.

    Siegfried, N. A., Busan, S., Rice, G. M., Nelson, J. A. E. & Weeks, K. M. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods (2014). doi:10.1038/nmeth.3029

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. Wan, Y., Kertesz, M., Spitale, R.C., Segal, E. & Chang, H.Y. Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12, 641655 (2011).
  2. Tome, J.M. et al. Comprehensive analysis of RNA-protein interactions by high-throughput sequencing–RNA affinity profiling. Nat. Methods 11, 683688 (2014).
  3. Lambert, N. et al. RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA-binding proteins. Mol. Cell 54, 887900 (2014).
  4. Buenrostro, J.D. et al. Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol. 32, 562568 (2014).
  5. Fowler, D.M. et al. High-resolution mapping of protein sequence–function relationships. Nat. Methods 7, 741746 (2010).
  6. Pitt, J.N. & Ferré-D'Amaré, A.R. Rapid construction of empirical RNA fitness landscapes. Science 330, 376379 (2010).
  7. Zykovich, A., Korf, I. & Segal, D.J. Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Res. 37, e151 (2009).
  8. Guenther, U.-P. et al. Hidden specificity in an apparently nonspecific RNA-binding protein. Nature 502, 385388 (2013).
  9. Auyeung, V.C., Ulitsky, I., McGeary, S.E. & Bartel, D.P. Beyond secondary structure: primary-sequence determinants license pri-miRNA hairpins for processing. Cell 152, 844858 (2013).
  10. D'Souza, V. & Summers, M.F. How retroviruses select their genomes. Nat. Rev. Microbiol. 3, 643655 (2005).
  11. Didierlaurent, L. et al. Role of HIV-1 RNA and protein determinants for the selective packaging of spliced and unspliced viral RNA and host U6 and 7SL RNA in virus particles. Nucleic Acids Res. 39, 89158927 (2011).
  12. McBride, M.S., Schwartz, M.D. & Panganiban, A.T. Efficient encapsidation of human immunodeficiency virus type 1 vectors and further characterization of cis elements required for encapsidation. J. Virol. 71, 45444554 (1997).
  13. Clever, J.L., Miranda, D., Parslow, T.G., Miranda, D. Jr. & Parslow, T.G. RNA structure and packaging signals in the 5′ leader region of the human immunodeficiency virus type 1 genome. J. Virol. 76, 1238112387 (2002).
  14. Wilkinson, K.A. et al. High-throughput SHAPE analysis reveals structures in HIV-1 genomic RNA strongly conserved across distinct biological states. PLoS Biol. 6:e96 (2008).
  15. Watts, J.M. et al. Architecture and secondary structure of an entire HIV-1 RNA genome. Nature 460, 711716 (2009).
  16. Abbink, T.E.M., Ooms, M., Haasnoot, P.C.J. & Berkhout, B. The HIV-1 leader RNA conformational switch regulates RNA dimerization but does not regulate mRNA translation. Biochemistry 44, 90589066 (2005).
  17. Siegfried, N.A., Busan, S., Rice, G.M., Nelson, J.A.E. & Weeks, K.M. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods 11, 959965 (2014).
  18. Paillart, J.C. et al. First snapshots of the HIV-1 RNA structure in infected cells and in virions. J. Biol. Chem. 279, 4839748403 (2004).
  19. Huthoff, H. & Berkhout, B. Two alternating structures of the HIV-1 leader RNA. RNA 7, 143157 (2001).
  20. Stephenson, J.D. et al. Three-dimensional RNA structure of the major HIV-1 packaging signal region. Structure 21, 951962 (2013).
  21. Lu, K. et al. NMR detection of structures in the HIV-1 5′-leader RNA that regulate genome packaging. Science 334, 242245 (2011).
  22. Damgaard, C.K., Dyhr-Mikkelsen, H. & Kjems, J. Mapping the RNA-binding sites for human immunodeficiency virus type-1 gag and NC proteins within the complete HIV-1 and -2 untranslated leader regions. Nucleic Acids Res. 26, 36673676 (1998).
  23. Skripkin, E., Paillart, J.C., Marquet, R., Ehresmann, B. & Ehresmann, C. Identification of the primary site of the human immunodeficiency virus type 1 RNA dimerization in vitro. Proc. Natl. Acad. Sci. USA 91, 49454949 (1994).
  24. Paillart, J.C., Skripkin, E., Ehresmann, B., Ehresmann, C. & Marquet, R. A loop-loop 'kissing' complex is the essential part of the dimer linkage of genomic HIV-1 RNA. Proc. Natl. Acad. Sci. USA 93, 55725577 (1996).
  25. Paillart, J.-C., Shehu-Xhilaga, M., Marquet, R. & Mak, J. Dimerization of retroviral RNA genomes: an inseparable pair. Nat. Rev. Microbiol. 2, 461472 (2004).
  26. Nikolaitchik, O., Rhodes, T.D., Ott, D. & Hu, W.S. Effects of mutations in the human immunodeficiency virus type 1 gag gene on RNA packaging and recombination. J. Virol. 80, 46914697 (2006).
  27. Heng, X. et al. Identification of a minimal region of the HIV-1 5′ leader required for RNA dimerization, NC binding and packaging. J. Mol. Biol. 417, 224239 (2012).
  28. Sakuragi, J., Iwamoto, A. & Shioda, T. Dissociation of genome dimerization from packaging functions and virion maturation of human immunodeficiency virus type 1. J. Virol. 76, 959967 (2002).
  29. Sakuragi, J.-I., Sakuragi, S. & Shioda, T. Minimal region sufficient for genome dimerization in the human immunodeficiency virus type 1 virion and its potential roles in the early stages of viral replication. J. Virol. 81, 79857992 (2007).
  30. Lever, A., Gottlinger, H., Haseltine, W. & Sodroski, J. Identification of a sequence required for efficient packaging of human immunodeficiency virus type 1 RNA into virions. J. Virol. 63, 40854087 (1989).
  31. Clavel, F. & Orenstein, J.M. A mutant of human immunodeficiency virus with reduced RNA packaging and abnormal particle morphology. J. Virol. 64, 52305234 (1990).
  32. Abd El-Wahab, E.W. et al. Specific recognition of the HIV-1 genomic RNA by the gag precursor. Nat. Commun. 5:4304 (2014).
  33. Kutluay, S.B. et al. Global changes in the RNA-binding specificity of HIV-1 gag regulate virion genesis. Cell 159, 10961109 (2014).
  34. Houzet, L. et al. HIV controls the selective packaging of genomic, spliced viral and cellular RNAs into virions through different mechanisms. Nucleic Acids Res. 35, 26952704 (2007).
  35. McKinstry, W.J. et al. Expression and purification of soluble recombinant full-length HIV-1 Pr55Gag protein in Escherichia coli. Protein Expr. Purif. 100, 1018 (2014).

Download references

Author information

Affiliations

  1. Architecture et Réactivité de l'ARN, Institut de Biologie Moléculaire et Cellulaire du Centre National de la Recherche Scientifique, Université de Strasbourg, Strasbourg, France.

    • Redmond P Smyth,
    • Laurence Despons,
    • Serena Bernacchi,
    • Fabrice Jossinet,
    • Jean-Christophe Paillart &
    • Roland Marquet
  2. BGI-Tech, BGI-Shenzhen, Shenzhen, China.

    • Gong Huili &
    • Li Weixi
  3. Centre for Virology, Burnet Institute, Melbourne, Victoria, Australia.

    • Marcel Hijnen &
    • Johnson Mak
  4. Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria, Australia.

    • Marcel Hijnen &
    • Johnson Mak
  5. Faculty of Health, School of Medicine, Deakin University, Geelong, Victoria, Australia.

    • Johnson Mak
  6. Commonwealth Scientific and Industrial Research Organization, Australian Animal Health Laboratory, Geelong, Victoria, Australia.

    • Johnson Mak
  7. Department of Mathematics and Computer Science, Freie Universität, Berlin, Germany.

    • Max von Kleist

Contributions

R.P.S. and R.M. designed the study. R.P.S. generated the libraries for sequencing and performed the MIME experiments. R.P.S., L.D., F.J. and M.v.K. developed bioinformatic tools. M.v.K. developed binding models and statistical tools. M.H. and J.M. expressed and purified the Pr55Gag protein. S.B. characterized the Pr55Gag protein. G.H. and L.W. performed DNA sequencing. R.P.S., J.-C.P., M.v.K. and R.M. analyzed the data. R.P.S., M.v.K. and R.M. wrote the paper with contributions from the other authors.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Raw mutation rate of DNA libraries (75 KB)

    Two independent mutated libraries were generated by PCR mutagenesis. Sequencing of these mutant libraries showed that we had introduced a mutation rate (μ) of approximately 0.006 mutations per nucleotide, respectively. Coefficient of variation was 37% in both libraries. Wild-type DNA was sequenced to measure errors introduced during library preparation and sequencing. Wild-type DNA showed a mutation rate of 0.0013 mutations per nucleotide with a coefficient of variation of 117%.

  2. Supplementary Figure 2: Comparison of relative Kd values obtained from experimental replicates (65 KB)

    Left: Significant median Kdm(max)/Kdw values computed from technical replicate 1 (experiment 1) compared to replicate/experiment 2, showing the Pearson correlation between the quantitative Kd estimates and the corresponding P-value for assessing non-zero correlation. ‘Median’ denotes the median of the re-sampling distribution obtained after computational analysis (see eq. (6) in the Online Methods. Significance is determined using eqs. (7)-(8) in the Online Methods after multiple test correction. The vertical- and horizontal dashed lines separate Kdm(max)/Kdw estimates that are significantly larger- or smaller than 1 in the respective experiments and thus allow a qualitative assessment of the respective relative Kd estimates between experimental replicates. For example, in 1% of all depicted relative Kd estimates, replicate 2 yielded Kdm(max)/Kdw > 1 vs. Kdm(max)/Kdw < 1 in experimental replicate 1. The diagonal dashed line indicates the line of unity. Right: Cross-tabulation of the estimated number of positions significantly altering (increasing/decreasing) Pr55Gag binding. Congruent estimates, in terms of labeling the position as significantly vs. not significantly altering Pr55Gag binding, are found on the diagonal. Incongruent estimates are found on the anti-diagonal.

  3. Supplementary Figure 3: Relative Kd values for each Pr55Gag concentration respectively vs. values obtained for the pooled (all concentrations) dataset (139 KB)

    Left panels: Significant median log2(Kdm(max)/Kdw) QUOTE   values of pooled data compared to (a) low Pr55Gag:RNA ratio (20nM Pr55Gag). (b) Equimolar Pr55Gag:RNA ratio (200nM Pr55Gag). (c) High Pr55Gag:RNA ratio (2000nM Pr55Gag), showing the Pearson correlation between the quantitative Kd estimates and the corresponding P-value for assessing non-zero correlation. ‘Median’ denotes the median of the re-sampling distribution obtained after computational analysis (see eq.(6) in the Online Methods). Significance is determined using eqs.(7)-(8) in the Online Methods after multiple test correction. Filled circles indicate Kdm(max)/Kdw estimates that are significantly smaller- or greater than 1 using either dataset, whereas red unfilled circles indicate Kdm(max)/Kdw estimates that are significantly smaller/greater 1 in the pooled dataset, but not in the dataset using the individual Pr55Gag concentration. Blue unfilled circles indicate Kdm(max)/Kdw estimates that are not significantly smaller/greater 1 in the pooled dataset, but which are estimated to be significantly altered when using the dataset with the individual Pr55Gag concentration. The vertical- and horizontal dashed lines separate Kdm(max)/Kdw estimates that are significantly larger- or smaller than 1 using the respective datasets and thus allow a qualitative assessment of the respective relative Kd estimates between datasets. The indicated percentages are computed on the bases of all Kdm(max)/Kdw estimates that are significantly smaller- or greater than 1 using either dataset (filled circles). Right panels: Cross-tabulation of the estimated number of positions significantly altering (increasing/decreasing) Pr55Gag binding. Congruent estimates, in terms of labeling the position as significantly vs. not significantly altering Pr55Gag binding, are found on the diagonal. Incongruent estimates are found on the anti-diagonal.

  4. Supplementary Figure 4: Binding of Pr55Gag to the core-binding domain by filter binding assay (37 KB)

    (a) Schematic of RNA (b) Binding of RNA (NL 1-532) and RNA corresponding to Pr55Gag core binding domain (NL 227-377) analyzed by filter binding assay.

  5. Supplementary Figure 5: Interaction between MS2 coat protein and MS2 stem loop in non-cognate RNA (204 KB)

    (a) Mapping of the effect of mutations on relative binding affinity, depicted as Kdm(max)/Kdw QUOTE  , to a structural representation of the HIV-1 genome including the MS2 stem loop, which was inserted between the TAR and polyA hairpins. (b) Median effect of mutations on relative binding affinity, Kdm(max)/Kdw, in the HIV-1 genome containing the MS2 stem loop (red) and the HIV-1 genome without the MS2 stem loop (blue; negative control). Weak unspecific binding of the MS2 coat protein to the polyA and SL1 hairpins was detected irrespective of the presence of MS2 stem loop. (c) Effect of specific mutations on log2(Kdm/Kdw), for positions 52-92. MS2 stem loop spans positions 61-79. Box and whisker plots show effect of each class of mutation on relative binding affinity expressed as log2(Kdm/Kdw) where black and white circle shows median, box shows quartiles (25% and 75%) and whiskers show extremes (excluding outliers). Mutation classes are colour coded: red mutated to A; green mutated to C; blue mutated to G; yellow mutated to U. (d) Zoom on MS2 stem loop structure showing mutation with maximum effect on relative Kd, Kdm(max)/Kdw. Colour scale shows red with decreased binding affinity, blue increased binding affinity.

  6. Supplementary Figure 6: Single variation analysis of Pr55Gag core binding domain (89 KB)

    Positions where certain classes of mutations have statistically different effects on Pr55Gag binding. Green circle: structure-affecting mutations significantly impair binding and structure-preserving mutations impair binding significantly less than other possibilities. Yellow circles: structure-affecting mutations significantly impair binding and structure-preserving mutations impairs binding significantly less than one of the other possible mutations. Blue circles: structure-modulating mutations significantly impair Pr55Gag binding and where structure-preserving mutations improve Pr55Gag binding. Grey Circles: other sites of interest. Box and whisker plots show effect of each class of mutation on relative binding affinity expressed as QUOTE   where black and white circle shows median, box shows quartiles (25% and 75%) and whiskers show extremes (excluding outliers). Mutation classes are colour coded: red mutated to A; green mutated to C; blue mutated to G; yellow mutated to U. Statistical tests are listed in Online Methods, P-values are listed in Supplementary Data 3.

  7. Supplementary Figure 7: Binding of Pr55Gag to mutant RNA (133 KB)

    HIV-1 genomic RNA containing point mutations was tested by filter binding experiments. These mutations were selected to be representative of the MIME data, including positions predicted to be single stranded, double stranded, positions showing strong evidence of RNA structure, and several positions where MIME indicates that co-variation maintains Pr55Gag binding.

  8. Supplementary Figure 8: Comparison of relative dissociation constants obtained from filter binding experiments vs. MIME (40 KB)

    Comparison of relative dissociation constants (expressed as log2(Kdm/Kdw)) for binding of Pr55Gag to HIV-1 genomic RNA, containing single point mutations, obtained from the filter binding experiments displayed in Supplementary Figure 7 vs. MIME. The left panel (a) compares the relative dissociation constants quantitatively. The black circles have the median log2(Kdm/Kdw) from the filter binding experiments vs. MIME as x- and y coordinates. The grey vertical bars show the quartiles of the MIME prediction, whereas the horizontal grey bars show the quartiles of the log2(Kdm/Kdw) estimate obtained from the filter binding experiment. The line of unity is indicated by a diagonal red dashed line and the vertical- and horizontal dashed black lined separate filter binding and MIME estimates that increase or decrease binding respectively. The right panel (b) shows a cross-tabulation comparing the qualitative outcome of the two assays. E.g. the upper left entry shows the number of Kdm/Kdw values that were estimated to be significantly different from 1 by MIME, but not by the filter binding experiment, whereas the upper right corner shows the number of Kdm/Kdw values that were estimated to be significantly different from 1 by both assays.

  9. Supplementary Figure 9: Mapping of effects of mutations on binding affinity to the structure of the HIV genomic RNA (40 KB)

    Effect of mutations mapped to the structure of the HIV-1 genome proposed by Siegfried et al. Mutation with maximum effect on Kd expressed as log2(Kdm(max)/Kdw). Postitions in red significantly impair Pr55Gag binding when mutated. Positions in blue significantly improve Pr55Gag binding when mutated. Positions with no significant change are shown in grey.

    Siegfried, N. A., Busan, S., Rice, G. M., Nelson, J. A. E. & Weeks, K. M. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods (2014). doi:10.1038/nmeth.3029

PDF files

  1. Supplementary Text and Figures (4,937 KB)

    Supplementary Figures 1–9, Supplementary Tables 1 and 2 and Supplementary Notes 1–4

Excel files

  1. Supplementary Data 1 (40 KB)

    Table containing complete dataset of relative Kd values for each genome position.

  2. Supplementary Data 2 (96 KB)

    Table containing complete dataset of relative Kd values for each class of mutation and each genome position.

  3. Supplementary Data 3 (20 KB)

    Table containing results of statistical test to assess whether different classes of mutation have comparable effects on Pr55Gag binding affinity.

  4. Supplementary Data 4 (10 KB)

    Table containing stems predicted to be important for Pr55Gag binding.

Additional data