Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes

Journal name:
Nature Biotechnology
Volume:
32,
Pages:
562–568
Year published:
DOI:
doi:10.1038/nbt.2880
Received
Accepted
Published online

Abstract

RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >107 RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA. Studying the MS2 coat protein, we decompose the binding energy contributions from primary and secondary RNA structure, and observe that differences in affinity are often driven by sequence-specific changes in both association and dissociation rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.

At a glance

Figures

  1. A massively parallel RNA array for quantitative, high-throughput biochemistry.
    Figure 1: A massively parallel RNA array for quantitative, high-throughput biochemistry.

    (a) Steps for generating RNA tethered to DNA clusters on a high-throughput, DNA sequencing flow cell. (b) Structure of the MS2 coat protein homodimer bound to the 19-nt hairpin RNA (PDB ID: 2BU1)33. (c) Images of fluorescently labeled MS2 bound to RNA clusters at increasing concentrations of protein and at time points following perfusion of unlabeled MS2 competitor. Below, fitted sum of Gaussians used to assign fluorescence to clusters. Scale bars, 2.5 μm. (d) Fluorescence decay of MS2 dissociating from clusters containing the consensus (−5C) sequence (t1/2 = 8.39 min). (e) Fit binding curves to clusters labeled in panel c. (f) The probability distribution of binding energies from all clusters with labeled variants; mean Kd = 2.57 nM, 36.8 nM and 415 nM for the −5C, −5U and −5A variants, respectively. (g) Correlation between binding energies reported in the literature and measured on the RNA array (squares, Carey et al.29; circles, Romaniuk et al.32). (Dashed line indicates our affinity measurement cutoff.)

  2. A quantitative map of MS2 binding across RNA sequence variants.
    Figure 2: A quantitative map of MS2 binding across RNA sequence variants.

    (a) Distribution of observed RNA variants by number of mutations. (b) Clusters measured per molecular variant as a function of mutation number. A median of ~11 clusters are observed for sequences with ≥4 mutations. Affinities for the consensus (−5C) sequence come from N−5C = 909,385 clusters. Box plots show median and upper/lower quartiles; whiskers show minimum and maximum. (c) Average −ΔΔG of point mutations per position. The −ΔΔG of alanine38 substitutions to the MS2 binding surface are shown in parentheses (kBT). Solid and dashed lines represent base and phosphate interactions, respectively. (d) Matrix of −ΔΔG for single and double mutants of the consensus sequence. Inset contains the matrix of −ΔΔG for single and double mutants of the +1G variant. All energies are calculated relative to the consensus sequence (arrow, −ΔΔG−5C = 0), and the number of quality-filtered double mutants in each matrix is indicated (M2). Gray: no data (N). (e) Epistasis matrix derived from d allows de novo reconstruction of the hairpin structure.

  3. Decomposition of primary and secondary RNA structure determinants of binding affinity.
    Figure 3: Decomposition of primary and secondary RNA structure determinants of binding affinity.

    (a) Fit parameters for linear regression model showing position-specific contributions. Energetic components for all possible noncanonical base-pair combinations are shown below. (b) Predicted binding energies of variants with second (M2) and third mutations (M3) in both single- and double-stranded regions. R, Pearson's correlation coefficient. (c,d) Primary (i.e., mean energetic contributions of transitions and transversions) (c) and secondary (d) structure contributions to affinity derived from a were mapped onto the hairpin (PDB ID: 1ZDH)40. MS2 protein surface color represents electrostatic charge.

  4. Sequence-specific contributions of association and dissociation rates to binding affinity.
    Figure 4: Sequence-specific contributions of association and dissociation rates to binding affinity.

    (a) Fractional contribution of dissociation rates for 31 single and 289 double mutants with measurable affinities and dissociation rates. Positions at the base of the hairpin are highlighted. Gray: no data (N). (b,c) Δlog(koff) (b) and Δlog(kon) (c) at the base of the hairpin. M2 = number of quality-filtered double mutants. (d) Distribution of fractional contributions of association (blue, μ = 0.57) and dissociation (red, μ = 0.43) rates to −ΔΔG for all measured mutants (N = 3,029).

  5. Evolutionary landscapes are highly constrained by biophysical requirements.
    Figure 5: Evolutionary landscapes are highly constrained by biophysical requirements.

    (a) Tesseracts describe traversal probabilities for the complete set (N = 24) of mutational paths between low- and high-affinity variants within four mutations. The AUC of the cumulative probability of ranked paths measures evolutionary constraint (EAUC), as modulated by epistasis (ε). (b) Density of cumulative probabilities for the ranked paths of 1,997 measured tesseracts. The fraction of the total path probabilities captured per individual path is shown as a function of path rank in the inset. The cumulative sum of these individual values is integrated to calculate EAUC. KDE, kernel density estimation. (c) Distribution of EAUC scores from observed tesseracts (red), tesseracts with uniform path probabilities (blue) and tesseracts with random affinities (purple) imply a highly structured epistatic landscape. The number of variants significantly constrained (P < 0.01, Benjamini-Hochberg) is indicated for both models. (d,e) Average evolutionary probability (d) and constraint (e) for paths with changes at each position of the hairpin. (f) Intermediate trajectories for base pair A:Uright arrowG:C and U:Aright arrowG:C transitions. (g) Probability ratio of evolutionary paths passing through G:U versus A:C intermediates by base derived from 696 tesseracts with A:Uright arrowG:C base pair transformations.

Videos

  1. Equilibrium binding and dissociation rate measurements on a sequenced flow cell
    Video 1: Equilibrium binding and dissociation rate measurements on a sequenced flow cell
    A small region of the flow cell is shown as fluorescently tagged MS2 coat protein is bound at increasing concentrations to the RNA clusters and is then removed from solution to determine dissociation constants. In the first frame, the fluorescence signal from a complementary oligonucleotide annealed to all RNAs is shown. A single cluster is circled (blue), and quantified fluorescence is shown in the inset. Subsequent frames show the quantified fluorescence signal at increasing concentrations of labeled MS2 coat protein. At the end of the binding experiment, the fit for that individual cluster is shown in the inset. The following frames show dissociation of labeled MS2, measured by replacing labeled with unlabeled MS2 in solution, and observing the decay of fluorescence over time. The inset dissociation curve depicts data from all clusters that share the sequence with the circled cluster (-5C variant).

Accession codes

Primary accessions

Sequence Read Archive

Referenced accessions

Protein Data Bank

References

  1. Keene, J.D. RNA regulons: coordination of post-transcriptional events. Nat. Rev. Genet. 8, 533543 (2007).
  2. Carey, J., Cameron, V., De Haseth, P.L. & Uhlenbeck, O.C. Sequence-specific interaction of R17 coat protein with its ribonucleic acid binding site. Biochemistry 22, 26012610 (1983).
  3. Tsvetanova, N.G., Klass, D.M., Salzman, J. & Brown, P.O. Proteome-wide search reveals unexpected RNA-binding proteins in Saccharomyces cerevisiae. PLoS ONE 5, e12671 (2010).
  4. Scherrer, T., Mittal, N., Janga, S.C. & Gerber, A.P. A screen for RNA-binding proteins in yeast indicates dual functions for many enzymes. PLoS ONE 5, e15499 (2010).
  5. Butter, F., Scheibe, M., Morl, M. & Mann, M. Unbiased RNA-protein interaction screen by quantitative proteomics. Proc. Natl. Acad. Sci. USA 106, 1062610631 (2009).
  6. Castello, A. et al. Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. Cell 149, 13931406 (2012).
  7. Wang, K.C. et al. A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature 472, 120124 (2011).
  8. Tsai, M.C. et al. Long noncoding RNA as modular scaffold of histone modification complexes. Science 329, 689693 (2010).
  9. Guttman, M. & Rinn, J.L. Modular regulatory principles of large non-coding RNAs. Nature 482, 339346 (2012).
  10. Culler, S.J., Hoff, K.G. & Smolke, C.D. Reprogramming cellular behavior with RNA controllers responsive to endogenous proteins. Science 330, 12511255 (2010).
  11. Ausländer, S., Ausländer, D., Müller, M., Wieland, M. & Fussenegger, M. Programmable single-cell mammalian biocomputers. Nature 487, 123127 (2012).
  12. SantaLucia, J. & Turner, D.H. Measuring the thermodynamics of RNA secondary structure formation. Biopolymers 44, 309319 (1997).
  13. Kertesz, M. et al. Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103107 (2010).
  14. Ding, Y. et al. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696700 (2014).
  15. Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701705 (2014).
  16. Ban, N., Nissen, P., Hansen, J., Moore, P.B. & Steitz, T.A. The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science 289, 905920 (2000).
  17. Wan, Y., Kertesz, M., Spitale, R.C., Segal, E. & Chang, H.Y. Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12, 641655 (2011).
  18. Martin, L. et al. Systematic reconstruction of RNA functional motifs with high-throughput microfluidics. Nat. Methods 9, 11921194 (2012).
  19. Ray, D. et al. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat. Biotechnol. 27, 667670 (2009).
  20. Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172177 (2013).
  21. Araya, C.L. et al. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc. Natl. Acad. Sci. USA 109, 1685816863 (2012).
  22. Pitt, J.N. & Ferre-D'Amare, A.R. Rapid construction of empirical RNA fitness landscapes. Science 330, 376379 (2010).
  23. Guenther, U.-P. et al. Hidden specificity in an apparently nonspecific RNA-binding protein. Nature 502, 385388 (2013).
  24. Matzas, M. et al. High-fidelity gene synthesis by retrieval of sequence-verified DNA identified using high-throughput pyrosequencing. Nat. Biotechnol. 28, 12911294 (2010).
  25. Myllykangas, S., Buenrostro, J.D., Natsoulis, G., Bell, J.M. & Ji, H.P. Efficient targeted resequencing of human germline and cancer genomes by oligonucleotide-selective sequencing. Nat. Biotechnol. 29, 10241027 (2011).
  26. Uemura, S. et al. Real-time tRNA transit on single translating ribosomes at codon resolution. Nature 464, 10121017 (2010).
  27. Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat. Biotechnol. 29, 659664 (2011).
  28. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 5359 (2008).
  29. Carey, J., Lowary, P.T. & Uhlenbeck, O.C. Interaction of R17 coat protein with synthetic variants of its ribonucleic acid binding site. Biochemistry 22, 47234730 (1983).
  30. Lim, F. & David, S.P. Mutations that increase the affinity of a translational repressor for RNA. Nucleic Acids Res. 22, 37483752 (1994).
  31. Valegård, K., Murray, J.B., Stockley, P.G., Stonehouse, N.J. & Liljas, L. Crystal structure of an RNA bacteriophage coat protein-operator complex. Nature 371, 623626 (1994).
  32. Romaniuk, P.J., Lowary, P., Wu, H.N., Stormo, G. & Uhlenbeck, O.C. RNA binding site of R17 coat protein. Biochemistry 26, 15631568 (1987).
  33. Grahn, E. et al. Structural basis of pyrimidine specificity in the MS2 RNA hairpin-coat-protein complex. RNA 7, 16161627 (2001).
  34. Bardwell, V.J. & Wickens, M. Purification of RNA and RNA-protein complexes by an R17 coat protein affinity method. Nucleic Acids Res. 18, 65876594 (1990).
  35. Bertrand, E. et al. Localization of ASH1 mRNA particles in living yeast. Mol. Cell 2, 437445 (1998).
  36. Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 7274 (2012).
  37. Greenleaf, W.J., Frieda, K.L., Foster, D.A., Woodside, M.T. & Block, S.M. Direct observation of hierarchical folding in single riboswitch aptamers. Science 319, 630633 (2008).
  38. Hobson, D. & Uhlenbeck, O.C. Alanine scanning of MS2 coat protein reveals protein–phosphate contacts involved in thermodynamic hot spots. J. Mol. Biol. 356, 613624 (2006).
  39. Varani, G. & McClain, W.H. The G·U wobble base pair. EMBO Rep. 1, 1823 (2000).
  40. Valegård, K. et al. The three-dimensional structures of two complexes between recombinant MS2 capsids and RNA operator fragments reveal sequence-specific protein-RNA interactions. J. Mol. Biol. 270, 724738 (1997).
  41. Breen, M.S., Kemena, C., Vlasov, P.K., Notredame, C. & Kondrashov, F.A. Epistasis as the primary factor in molecular evolution. Nature 490, 535538 (2012).
  42. McCandlish, D.M., Rajon, E., Shah, P., Ding, Y. & Plotkin, J.B. The role of epistasis in protein evolution. Nature 497, E1E2 (2013).
  43. Weinreich, D.M. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111114 (2006).
  44. Bridgham, J.T., Ortlund, E.A. & Thornton, J.W. An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature 461, 515519 (2009).
  45. Natarajan, C. et al. Epistasis among adaptive mutations in deer mouse hemoglobin. Science 340, 13241327 (2013).
  46. Rousset, F., Pélandakis, M. & Solignac, M. Evolution of compensatory substitutions through G.U intermediate state in Drosophila rRNA. Proc. Natl. Acad. Sci. USA 88, 1003210036 (1991).
  47. Gell, C. et al. Single-molecule fluorescence resonance energy transfer assays reveal heterogeneous folding ensembles in a simple RNA stem–loop. J. Mol. Biol. 384, 264278 (2008).
  48. Licatalosi, D.D. et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464469 (2008).
  49. Zhao, J. et al. Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol. Cell 40, 939953 (2010).
  50. Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y. & Greenleaf, W.J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 12131218 (2013).

Download references

Author information

  1. These authors contributed equally to this work.

    • Jason D Buenrostro &
    • Carlos L Araya

Affiliations

  1. Department of Genetics, Stanford University School of Medicine, Stanford, California, USA.

    • Jason D Buenrostro,
    • Carlos L Araya,
    • Lauren M Chircus,
    • Curtis J Layton,
    • Michael P Snyder &
    • William J Greenleaf
  2. Program in Epithelial Biology and the Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, California, USA.

    • Jason D Buenrostro &
    • Howard Y Chang
  3. Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, California, USA.

    • Lauren M Chircus

Contributions

W.J.G., J.D.B. and C.J.L. conceived of the method. J.D.B. developed the RNA display protocol. J.D.B. and L.M.C. designed and performed on-chip assays. L.M.C. designed and performed the protein purification and in vitro binding assays. J.D.B. wrote the image analysis algorithm with input from W.J.G. and C.J.L. C.L.A. developed and implemented the structural (epistatic), functional (modeling, kinetic) and evolutionary analyses. All authors interpreted the data and wrote the manuscript. W.J.G. supervised all aspects of this work.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

Video

  1. Video 1: Equilibrium binding and dissociation rate measurements on a sequenced flow cell (997 KB, Download)
    A small region of the flow cell is shown as fluorescently tagged MS2 coat protein is bound at increasing concentrations to the RNA clusters and is then removed from solution to determine dissociation constants. In the first frame, the fluorescence signal from a complementary oligonucleotide annealed to all RNAs is shown. A single cluster is circled (blue), and quantified fluorescence is shown in the inset. Subsequent frames show the quantified fluorescence signal at increasing concentrations of labeled MS2 coat protein. At the end of the binding experiment, the fit for that individual cluster is shown in the inset. The following frames show dissociation of labeled MS2, measured by replacing labeled with unlabeled MS2 in solution, and observing the decay of fluorescence over time. The inset dissociation curve depicts data from all clusters that share the sequence with the circled cluster (-5C variant).

PDF files

  1. Supplementary Text and Figures (4 MB)

    Supplementary Figures 1–13 and Supplementary Discussion

Excel files

  1. Supplementary Table 1 (25 KB)

    Oligonucleotide sequences used in this study.

  2. Supplementary Table 2 (22 KB)

    Measured binding energies and quality metrics for 129,248 MS2 RNA hairpin sequences. Note: Position indexes in -15,+3 indexing whereby. "NA:NA" indicates the consensus sequence (with zero mutations).

  3. Supplementary Table 3 (812 KB)

    Measured dissociation and inferred association rates for 3,029 MS2 RNA hairpin sequences. Note: Position indexes in -15,+3 indexing whereby. "NA:NA" indicates the consensus sequence (with zero mutations).

  4. Supplementary Table 4 (222 KB)

    Summary of evolutionary path probabilities and constraint in 1,997 tesseracts. Note: Position indexes in -15,+3 indexing whereby. "NA:NA" indicates the consensus sequence (with zero mutations).

Zip files

  1. Supplementary Data (98 KB)

    Image analysis software.

Additional data