Article | Published:

Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes

Nature Biotechnology volume 32, pages 562568 (2014) | Download Citation


RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >107 RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA. Studying the MS2 coat protein, we decompose the binding energy contributions from primary and secondary RNA structure, and observe that differences in affinity are often driven by sequence-specific changes in both association and dissociation rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


Primary accessions

Sequence Read Archive

Referenced accessions


  1. 1.

    RNA regulons: coordination of post-transcriptional events. Nat. Rev. Genet. 8, 533–543 (2007).

  2. 2.

    , , & Sequence-specific interaction of R17 coat protein with its ribonucleic acid binding site. Biochemistry 22, 2601–2610 (1983).

  3. 3.

    , , & Proteome-wide search reveals unexpected RNA-binding proteins in Saccharomyces cerevisiae. PLoS ONE 5, e12671 (2010).

  4. 4.

    , , & A screen for RNA-binding proteins in yeast indicates dual functions for many enzymes. PLoS ONE 5, e15499 (2010).

  5. 5.

    , , & Unbiased RNA-protein interaction screen by quantitative proteomics. Proc. Natl. Acad. Sci. USA 106, 10626–10631 (2009).

  6. 6.

    et al. Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. Cell 149, 1393–1406 (2012).

  7. 7.

    et al. A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature 472, 120–124 (2011).

  8. 8.

    et al. Long noncoding RNA as modular scaffold of histone modification complexes. Science 329, 689–693 (2010).

  9. 9.

    & Modular regulatory principles of large non-coding RNAs. Nature 482, 339–346 (2012).

  10. 10.

    , & Reprogramming cellular behavior with RNA controllers responsive to endogenous proteins. Science 330, 1251–1255 (2010).

  11. 11.

    , , , & Programmable single-cell mammalian biocomputers. Nature 487, 123–127 (2012).

  12. 12.

    & Measuring the thermodynamics of RNA secondary structure formation. Biopolymers 44, 309–319 (1997).

  13. 13.

    et al. Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 (2010).

  14. 14.

    et al. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700 (2014).

  15. 15.

    , , , & Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2014).

  16. 16.

    , , , & The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science 289, 905–920 (2000).

  17. 17.

    , , , & Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12, 641–655 (2011).

  18. 18.

    et al. Systematic reconstruction of RNA functional motifs with high-throughput microfluidics. Nat. Methods 9, 1192–1194 (2012).

  19. 19.

    et al. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat. Biotechnol. 27, 667–670 (2009).

  20. 20.

    et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013).

  21. 21.

    et al. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc. Natl. Acad. Sci. USA 109, 16858–16863 (2012).

  22. 22.

    & Rapid construction of empirical RNA fitness landscapes. Science 330, 376–379 (2010).

  23. 23.

    et al. Hidden specificity in an apparently nonspecific RNA-binding protein. Nature 502, 385–388 (2013).

  24. 24.

    et al. High-fidelity gene synthesis by retrieval of sequence-verified DNA identified using high-throughput pyrosequencing. Nat. Biotechnol. 28, 1291–1294 (2010).

  25. 25.

    , , , & Efficient targeted resequencing of human germline and cancer genomes by oligonucleotide-selective sequencing. Nat. Biotechnol. 29, 1024–1027 (2011).

  26. 26.

    et al. Real-time tRNA transit on single translating ribosomes at codon resolution. Nature 464, 1012–1017 (2010).

  27. 27.

    et al. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat. Biotechnol. 29, 659–664 (2011).

  28. 28.

    et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

  29. 29.

    , & Interaction of R17 coat protein with synthetic variants of its ribonucleic acid binding site. Biochemistry 22, 4723–4730 (1983).

  30. 30.

    & Mutations that increase the affinity of a translational repressor for RNA. Nucleic Acids Res. 22, 3748–3752 (1994).

  31. 31.

    , , , & Crystal structure of an RNA bacteriophage coat protein-operator complex. Nature 371, 623–626 (1994).

  32. 32.

    , , , & RNA binding site of R17 coat protein. Biochemistry 26, 1563–1568 (1987).

  33. 33.

    et al. Structural basis of pyrimidine specificity in the MS2 RNA hairpin-coat-protein complex. RNA 7, 1616–1627 (2001).

  34. 34.

    & Purification of RNA and RNA-protein complexes by an R17 coat protein affinity method. Nucleic Acids Res. 18, 6587–6594 (1990).

  35. 35.

    et al. Localization of ASH1 mRNA particles in living yeast. Mol. Cell 2, 437–445 (1998).

  36. 36.

    et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2012).

  37. 37.

    , , , & Direct observation of hierarchical folding in single riboswitch aptamers. Science 319, 630–633 (2008).

  38. 38.

    & Alanine scanning of MS2 coat protein reveals protein–phosphate contacts involved in thermodynamic hot spots. J. Mol. Biol. 356, 613–624 (2006).

  39. 39.

    & The G·U wobble base pair. EMBO Rep. 1, 18–23 (2000).

  40. 40.

    et al. The three-dimensional structures of two complexes between recombinant MS2 capsids and RNA operator fragments reveal sequence-specific protein-RNA interactions. J. Mol. Biol. 270, 724–738 (1997).

  41. 41.

    , , , & Epistasis as the primary factor in molecular evolution. Nature 490, 535–538 (2012).

  42. 42.

    , , , & The role of epistasis in protein evolution. Nature 497, E1–E2 (2013).

  43. 43.

    Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006).

  44. 44.

    , & An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature 461, 515–519 (2009).

  45. 45.

    et al. Epistasis among adaptive mutations in deer mouse hemoglobin. Science 340, 1324–1327 (2013).

  46. 46.

    , & Evolution of compensatory substitutions through G.U intermediate state in Drosophila rRNA. Proc. Natl. Acad. Sci. USA 88, 10032–10036 (1991).

  47. 47.

    et al. Single-molecule fluorescence resonance energy transfer assays reveal heterogeneous folding ensembles in a simple RNA stem–loop. J. Mol. Biol. 384, 264–278 (2008).

  48. 48.

    et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464–469 (2008).

  49. 49.

    et al. Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol. Cell 40, 939–953 (2010).

  50. 50.

    , , , & Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).

Download references


This work was supported by National Institutes of Health (NIH) NIH R01-HG004361 (to H.Y.C.); H.Y.C. is an Early Career Scientist of the Howard Hughes Medical Institute. J.D.B. and L.M.C. acknowledge support from the National Science Foundation Graduate Research Fellowships. J.D.B. also acknowledges support from NIH training grant T32HG000044. L.M.C. acknowledges support from NIH training grant T32GM067586. J.D.B. and L.M.C. each acknowledge support of a National Science Foundation graduate research fellowship. M.P.S. and C.L.A. acknowledge the NIH and the National Human Genome Research Institute (NHGRI) for funding through 5U54HG00455805. We thank D. Herschlag for feedback and advice throughout the methods development, and G. Sherlock and D. Herschlag for critical readings of earlier versions of this manuscript. We thank R. Landick for discussions regarding the design of our synthetic DNA library and K. Bajaj for discussions regarding quantification of cluster fluorescence. We also thank O.D. Phanstiel, M. Sikora and O. Cornejo for discussions on the modeling and evolutionary analyses.

Author information

Author notes

    • Jason D Buenrostro
    •  & Carlos L Araya

    These authors contributed equally to this work.


  1. Department of Genetics, Stanford University School of Medicine, Stanford, California, USA.

    • Jason D Buenrostro
    • , Carlos L Araya
    • , Lauren M Chircus
    • , Curtis J Layton
    • , Michael P Snyder
    •  & William J Greenleaf
  2. Program in Epithelial Biology and the Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, California, USA.

    • Jason D Buenrostro
    •  & Howard Y Chang
  3. Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, California, USA.

    • Lauren M Chircus


  1. Search for Jason D Buenrostro in:

  2. Search for Carlos L Araya in:

  3. Search for Lauren M Chircus in:

  4. Search for Curtis J Layton in:

  5. Search for Howard Y Chang in:

  6. Search for Michael P Snyder in:

  7. Search for William J Greenleaf in:


W.J.G., J.D.B. and C.J.L. conceived of the method. J.D.B. developed the RNA display protocol. J.D.B. and L.M.C. designed and performed on-chip assays. L.M.C. designed and performed the protein purification and in vitro binding assays. J.D.B. wrote the image analysis algorithm with input from W.J.G. and C.J.L. C.L.A. developed and implemented the structural (epistatic), functional (modeling, kinetic) and evolutionary analyses. All authors interpreted the data and wrote the manuscript. W.J.G. supervised all aspects of this work.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to William J Greenleaf.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–13 and Supplementary Discussion

Excel files

  1. 1.

    Supplementary Table 1

    Oligonucleotide sequences used in this study.

  2. 2.

    Supplementary Table 2

    Measured binding energies and quality metrics for 129,248 MS2 RNA hairpin sequences. Note: Position indexes in -15,+3 indexing whereby. "NA:NA" indicates the consensus sequence (with zero mutations).

  3. 3.

    Supplementary Table 3

    Measured dissociation and inferred association rates for 3,029 MS2 RNA hairpin sequences. Note: Position indexes in -15,+3 indexing whereby. "NA:NA" indicates the consensus sequence (with zero mutations).

  4. 4.

    Supplementary Table 4

    Summary of evolutionary path probabilities and constraint in 1,997 tesseracts. Note: Position indexes in -15,+3 indexing whereby. "NA:NA" indicates the consensus sequence (with zero mutations).

Zip files

  1. 1.

    Supplementary Data

    Image analysis software.


  1. 1.

    Equilibrium binding and dissociation rate measurements on a sequenced flow cell

    A small region of the flow cell is shown as fluorescently tagged MS2 coat protein is bound at increasing concentrations to the RNA clusters and is then removed from solution to determine dissociation constants. In the first frame, the fluorescence signal from a complementary oligonucleotide annealed to all RNAs is shown. A single cluster is circled (blue), and quantified fluorescence is shown in the inset. Subsequent frames show the quantified fluorescence signal at increasing concentrations of labeled MS2 coat protein. At the end of the binding experiment, the fit for that individual cluster is shown in the inset. The following frames show dissociation of labeled MS2, measured by replacing labeled with unlabeled MS2 in solution, and observing the decay of fluorescence over time. The inset dissociation curve depicts data from all clusters that share the sequence with the circled cluster (-5C variant).

About this article

Publication history





Further reading