RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >107 RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA. Studying the MS2 coat protein, we decompose the binding energy contributions from primary and secondary RNA structure, and observe that differences in affinity are often driven by sequence-specific changes in both association and dissociation rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.
At a glance
- RNA regulons: coordination of post-transcriptional events. Nat. Rev. Genet. 8, 533–543 (2007).
- Sequence-specific interaction of R17 coat protein with its ribonucleic acid binding site. Biochemistry 22, 2601–2610 (1983). , , &
- Proteome-wide search reveals unexpected RNA-binding proteins in Saccharomyces cerevisiae. PLoS ONE 5, e12671 (2010). , , &
- A screen for RNA-binding proteins in yeast indicates dual functions for many enzymes. PLoS ONE 5, e15499 (2010). , , &
- Unbiased RNA-protein interaction screen by quantitative proteomics. Proc. Natl. Acad. Sci. USA 106, 10626–10631 (2009). , , &
- Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. Cell 149, 1393–1406 (2012). et al.
- A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature 472, 120–124 (2011). et al.
- Long noncoding RNA as modular scaffold of histone modification complexes. Science 329, 689–693 (2010). et al.
- Modular regulatory principles of large non-coding RNAs. Nature 482, 339–346 (2012). &
- Reprogramming cellular behavior with RNA controllers responsive to endogenous proteins. Science 330, 1251–1255 (2010). , &
- Programmable single-cell mammalian biocomputers. Nature 487, 123–127 (2012). , , , &
- Measuring the thermodynamics of RNA secondary structure formation. Biopolymers 44, 309–319 (1997). &
- Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 (2010). et al.
- In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700 (2014). et al.
- Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2014). , , , &
- The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science 289, 905–920 (2000). , , , &
- Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12, 641–655 (2011). , , , &
- Systematic reconstruction of RNA functional motifs with high-throughput microfluidics. Nat. Methods 9, 1192–1194 (2012). et al.
- Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat. Biotechnol. 27, 667–670 (2009). et al.
- A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013). et al.
- A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc. Natl. Acad. Sci. USA 109, 16858–16863 (2012). et al.
- Rapid construction of empirical RNA fitness landscapes. Science 330, 376–379 (2010). &
- Hidden specificity in an apparently nonspecific RNA-binding protein. Nature 502, 385–388 (2013). et al.
- High-fidelity gene synthesis by retrieval of sequence-verified DNA identified using high-throughput pyrosequencing. Nat. Biotechnol. 28, 1291–1294 (2010). et al.
- Efficient targeted resequencing of human germline and cancer genomes by oligonucleotide-selective sequencing. Nat. Biotechnol. 29, 1024–1027 (2011). , , , &
- Real-time tRNA transit on single translating ribosomes at codon resolution. Nature 464, 1012–1017 (2010). et al.
- Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat. Biotechnol. 29, 659–664 (2011). et al.
- Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008). et al.
- Interaction of R17 coat protein with synthetic variants of its ribonucleic acid binding site. Biochemistry 22, 4723–4730 (1983). , &
- Mutations that increase the affinity of a translational repressor for RNA. Nucleic Acids Res. 22, 3748–3752 (1994). &
- Crystal structure of an RNA bacteriophage coat protein-operator complex. Nature 371, 623–626 (1994). , , , &
- RNA binding site of R17 coat protein. Biochemistry 26, 1563–1568 (1987). , , , &
- Structural basis of pyrimidine specificity in the MS2 RNA hairpin-coat-protein complex. RNA 7, 1616–1627 (2001). et al.
- Purification of RNA and RNA-protein complexes by an R17 coat protein affinity method. Nucleic Acids Res. 18, 6587–6594 (1990). &
- Localization of ASH1 mRNA particles in living yeast. Mol. Cell 2, 437–445 (1998). et al.
- Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2012). et al.
- Direct observation of hierarchical folding in single riboswitch aptamers. Science 319, 630–633 (2008). , , , &
- Alanine scanning of MS2 coat protein reveals protein–phosphate contacts involved in thermodynamic hot spots. J. Mol. Biol. 356, 613–624 (2006). &
- The G·U wobble base pair. EMBO Rep. 1, 18–23 (2000). &
- The three-dimensional structures of two complexes between recombinant MS2 capsids and RNA operator fragments reveal sequence-specific protein-RNA interactions. J. Mol. Biol. 270, 724–738 (1997). et al.
- Epistasis as the primary factor in molecular evolution. Nature 490, 535–538 (2012). , , , &
- The role of epistasis in protein evolution. Nature 497, E1–E2 (2013). , , , &
- Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006).
- An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature 461, 515–519 (2009). , &
- Epistasis among adaptive mutations in deer mouse hemoglobin. Science 340, 1324–1327 (2013). et al.
- Evolution of compensatory substitutions through G.U intermediate state in Drosophila rRNA. Proc. Natl. Acad. Sci. USA 88, 10032–10036 (1991). , &
- Single-molecule fluorescence resonance energy transfer assays reveal heterogeneous folding ensembles in a simple RNA stem–loop. J. Mol. Biol. 384, 264–278 (2008). et al.
- HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464–469 (2008). et al.
- Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol. Cell 40, 939–953 (2010). et al.
- Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013). , , , &
- Video 1: Equilibrium binding and dissociation rate measurements on a sequenced flow cell (997 KB, Download)
- A small region of the flow cell is shown as fluorescently tagged MS2 coat protein is bound at increasing concentrations to the RNA clusters and is then removed from solution to determine dissociation constants. In the first frame, the fluorescence signal from a complementary oligonucleotide annealed to all RNAs is shown. A single cluster is circled (blue), and quantified fluorescence is shown in the inset. Subsequent frames show the quantified fluorescence signal at increasing concentrations of labeled MS2 coat protein. At the end of the binding experiment, the fit for that individual cluster is shown in the inset. The following frames show dissociation of labeled MS2, measured by replacing labeled with unlabeled MS2 in solution, and observing the decay of fluorescence over time. The inset dissociation curve depicts data from all clusters that share the sequence with the circled cluster (-5C variant).
- Supplementary Text and Figures (4 MB)
Supplementary Figures 1–13 and Supplementary Discussion
- Supplementary Table 1 (25 KB)
Oligonucleotide sequences used in this study.
- Supplementary Table 2 (22 KB)
Measured binding energies and quality metrics for 129,248 MS2 RNA hairpin sequences. Note: Position indexes in -15,+3 indexing whereby. "NA:NA" indicates the consensus sequence (with zero mutations).
- Supplementary Table 3 (812 KB)
Measured dissociation and inferred association rates for 3,029 MS2 RNA hairpin sequences. Note: Position indexes in -15,+3 indexing whereby. "NA:NA" indicates the consensus sequence (with zero mutations).
- Supplementary Table 4 (222 KB)
Summary of evolutionary path probabilities and constraint in 1,997 tesseracts. Note: Position indexes in -15,+3 indexing whereby. "NA:NA" indicates the consensus sequence (with zero mutations).
- Supplementary Data (98 KB)
Image analysis software.