Nucleic-acid-binding proteins are generally viewed as either specific or nonspecific, depending on characteristics of their binding sites in DNA or RNA1, 2. Most studies have focused on specific proteins, which identify cognate sites by binding with highest affinities to regions with defined signatures in sequence, structure or both1, 2, 3, 4. Proteins that bind to sites devoid of defined sequence or structure signatures are considered nonspecific1, 2, 5. Substrate binding by these proteins is poorly understood, and it is not known to what extent seemingly nonspecific proteins discriminate between different binding sites, aside from those sequestered by nucleic acid structures6. Here we systematically examine substrate binding by the apparently nonspecific RNA-binding protein C5, and find clear discrimination between different binding site variants. C5 is the protein subunit of the transfer RNA processing ribonucleoprotein enzyme RNase P from Escherichia coli. The protein binds 5′ leaders of precursor tRNAs at a site without sequence or structure signatures. We measure functional binding of C5 to all possible sequence variants in its substrate binding site, using a high-throughput sequencing kinetics approach (HITS-KIN) that simultaneously follows processing of thousands of RNA species. C5 binds different substrate variants with affinities varying by orders of magnitude. The distribution of functional affinities of C5 for all substrate variants resembles affinity distributions of highly specific nucleic acid binding proteins. Unlike these specific proteins, C5 does not bind its physiological RNA targets with the highest affinity, but with affinities near the median of the distribution, a region that is not associated with a sequence signature. We delineate defined rules governing substrate recognition by C5, which reveal specificity that is hidden in cellular substrates for RNase P. Our findings suggest that apparently nonspecific and specific RNA-binding modes may not differ fundamentally, but represent distinct parts of common affinity distributions.
- The role of RNA sequence and structure in RNA–protein interactions. J. Mol. Biol. 409, 574–587 (2011) &
- On the specificity of DNA–protein interactions. Proc. Natl Acad. Sci. USA 83, 1608–1612 (1986) &
- Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nature Biotechnol. 27, 667–670 (2009) et al.
- Cooperativity in RNA–protein interactions: global analysis of RNA binding specificity. Cell Rep. 1, 570–581 (2012) et al.
- Building specificity with nonspecific RNA-binding proteins. Nature Struct. Mol. Biol. 12, 645–653 (2005) &
- Structural bias in T4 RNA ligase-mediated 3′-adapter ligation. Nucleic Acids Res. 40, e54 (2012) , , , &
- Ribonuclease P: a ribonucleoprotein enzyme. Curr. Opin. Chem. Biol. 4, 553–558 (2000) &
- Importance of RNA–protein interactions in bacterial ribonuclease P structure and catalysis. Biopolymers 87, 329–338 (2007) , &
- Structure of a bacterial ribonuclease P holoenzyme in complex with tRNA. Nature 468, 784–789 (2010) et al.
- The 5′ leader of precursor tRNAAsp bound to the Bacillus subtilis RNase P holoenzyme has an extended conformation. Biochemistry 44, 16130–16139 (2005) , , , &
- The role of induced fit and conformational changes of enzymes in specificity and catalysis. Bioorg. Chem. 16, 62–96 (1988)
- 1985) Enzyme Structure and Mechanism (Freeman,
- Enzyme specificity: its meaning in the general case. J. Theor. Biol. 108, 451–457 (1984)
- 915–930 (CRC Press, 2006) in Isotope Effects in Chemistry and Biology (eds & )
- Analysis of enzyme specificity by multiple substrate kinetics. Biochemistry 32, 4344–4348 (1993) , &
- Genomic SELEX for Hfq-binding RNAs identifies genomic aptamers predominantly in antisense transcripts. Nucleic Acids Res. 38, 3794–3808 (2010) et al.
- Rapid construction of empirical RNA fitness landscapes. Science 330, 376–379 (2010) &
- Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009) et al.
- Analysis of a complete DNA–protein affinity landscape. J. R. Soc. Interface 7, 397–408 (2010) et al.
- Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nature Biotechnol. 29, 659–664 (2011) et al.
- Determining the specificity of protein–DNA interactions. Nature Rev. Genet. 11, 751–760 (2010) &
- Measuring the thermodynamics of RNA secondary structure formation. Biopolymers 44, 309–319 (1997) &
- Calculation of folding energies of single-stranded nucleic acid sequences: conceptual issues. J. Theor. Biol. 248, 745–753 (2007)
- A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007) &
- Quantitative analysis demonstrates most transcription factors require only simple models of specificity. Nature Biotechnol. 29, 480–483 (2011) &
- Binding of C5 protein to P RNA enhances the rate constant for catalysis for P RNA processing of pre-tRNAs lacking a consensus (+ 1)/C(+ 72) pair. J. Mol. Biol. 395, 1019–1037 (2010) , , &
- The building blocks and motifs of RNA architecture. Curr. Opin. Struct. Biol. 16, 279–287 (2006) , &
- Imino proton exchange and base-pair kinetics in RNA duplexes. Biochemistry 40, 8898–8904 (2001) &
- Uniform binding of aminoacyl-tRNAs to elongation factor Tu by thermodynamic compensation. Science 294, 165–168 (2001) , &
- Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res. 14, 6661–6679 (1986) , &
- RNA-dependent folding and stabilization of C5 protein during assembly of the E. coli RNase P holoenzyme. J. Mol. Biol. 360, 190–203 (2006) et al.
- Identification of individual nucleotides in the bacterial ribonuclease P ribozyme adjacent to the pre-tRNA cleavage site by short-range photo-cross-linking. Biochemistry 37, 17618–17628 (1998) , &
- Kinetics of enzyme reactions with competing alternative substrates. Mol. Pharmacol. 4, 621–629 (1968)
- GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 37, D93–D97 (2009) &
- Fitting enzyme-kinetic data to V/K. Anal. Biochem. 132, 457–461 (1983)
- Rethinking fundamentals of enzyme action. Adv. Enzymol. 73, 25–55 (1999)
- 1961) Economic Forecasts and Policy (North Holland Publishing,
- Comparison of stopping rules in forward “stepwise” regression. J. Am. Stat. Assoc. 72, 46–53 (1977) &
Extended data figures and tables
Extended Data Figures
- Extended Data Figure 1: C5 binding site in the 87 ptRNA leaders in E. coli. (414 KB)
a–c, Alignment and sequence logos for the C5 binding site in all 87 ptRNA leaders encoded by E. coli. Binding of C5 to the consecutive ptRNA positions −3 to −8 is well established, based on a crystal structure9 and biochemical evidence10; that is, looping of bases seen for certain RNA- and DNA-binding proteins, does not occur with C5. Consistent with this idea, we did not detect any sequence motif with the MEME software, when including positions −1 to −10. a, Sequence alignment. Sequences were aligned with CLUSTAL. Coloured squares indicate the bases (C, blue; A, green; U, red; G, black). Anticodon, the anticodon recognized by the tRNA; tRNA#, the tRNA identification number; tRNA type, the amino acid. b, Sequence logo depicting the probability of any base at a given position, based on the alignment in a. The logo was generated with Weblogo. c, Sequence logo for the information content of the alignment in a. The logo was generated with Weblogo.
- Extended Data Figure 2: Preparation of DNA libraries for Illumina sequencing. (168 KB)
a, BAR, the indexing barcode; NN, the degenerated barcode. For primer sequences see Methods. RT, reverse transcription. b, DNA libraries (PCR products, a) for samples at the time points indicated. Controls: lane 5, no RNA; lane 6, no reverse transcriptase. c, Read structure. Nucleotides 1 and 2 are degenerated barcode; nucleotides 3-5 are sample barcode (index tag); nucleotides 6–29 are additional leader sequence, nucleotides 30–35 are randomized leader sequence; nucleotides 38 onwards are tRNA.
- Extended Data Figure 3: Multiple turnover reaction scheme. (50 KB)
E, enzyme; ES1...i, individual enzyme substrate complexes; K1...i, individual functional binding constants; S1....i, individual substrate variants; V1...i, individual reaction rate constants.
- Extended Data Figure 4: Effect of the 21 nucleotide extension on ptRNA processing by RNase P. (311 KB)
a, Relative processing rate constants were measured for three sequence variants from different parts of the affinity distribution by PAGE. Reactions for each sequence variant were conducted in the presence of the randomized population (unlabelled) with equal amounts of substrate with (S/21) and without the 21-nucleotide extension (S/nL). The asterisk marks the position of the radiolabel at the 5′ end of the substrate. Reactions were conducted under the conditions described in the Methods. b, PAGE for the reaction of the reference sequence variant. The time point at 5 min is marked for reference. c, The effects of the 21-nucleotide extension on relative processing rate constants of the three indicated sequence variants. The position of each sequence variant in the affinity distribution of all sequence variants (Fig. 2d) is given for reference by the vertical line above the plot. The number indicates the factor (S/nL)/(S/21) by which the 21-nucleotide extension decreases the relative rate constant of the given sequence variant, given as average from three independent experiments. The horizontal line approximates the degree of the relative change. The 21-nucleotide extension decreases the observed for sequence variant (CTCCTG) by a factor of 2.3. For the genomically encoded leader sequence AAAAAG, the 21-nucleotide extension decreases krel for by a factor of 0.95; that is, the substrate with the extension reacts slightly faster than the substrate without extension. The fast reacting substrate (TTATAT) is also only minimally affected by the extension (0.92). Together, the data show only minor effects of the 21-nucleotide extension on the position of a given sequence variant in the affinity distribution.
- Extended Data Figure 5: Processing of ptRNAMet(-3-8) by RNase P without C5. (115 KB)
Distribution of krel values for processing of ptRNAMet(-3-8) by RNase P without C5 (black line). Data were obtained analogously to those with C5. For comparison, the distribution of krel values with C5 is shown (red line).
- Extended Data Figure 6: Sequence logos are only associated with the high-affinity tail of the distribution. (636 KB)
a, Plot of sequence variants ranked from weakest to tightest binder to the specific transcription factor Arid3a (Fig. 2d), based on data published previously18. To facilitate direct comparison to the six-nucleotide binding site of C5, only approximately half of all sequences are shown in the plot, and only six positions (positions two to seven, as indicated) of the eight-nucleotide binding site are shown. The position in the binding site is marked on the right. The brackets mark 0.1% of sequence variants (33 sequences) that bind tightest, fall into the medium, and bind weakest. Sequence logos show the information content in these sequences. The logos were generated with Weblogo. Sequence signatures of the tightest binding variants are highly enriched in physiological substrates of Arid3a18. b, Plot of sequence variants ranked from weakest to tightest binder to another specific transcription factor, Hnf4a, based on data published previously18. Approximately half of all sequences are shown in the plot, and six positions (positions two to seven, as indicated) of the eight-nucleotide binding site. Sequence signatures of the tightest binding variants are highly enriched in physiological substrates of Hnf4A18. c, Plot of sequence variants ranked from slowest to fastest reacting for C5 (Fig. 2e). The brackets mark 1% of sequence variants that react fastest, fall into the medium and react slowest. Sequence logos were generated as in a.
- Extended Data Figure 7: Sequence determinants for substrate recognition by C5. (193 KB)
a, Model considering identity, but not position of a given base in the C5 binding site. Ranking of the four bases according to their potential to promote (positive linear coefficient) or decrease (negative linear coefficient) functional C5 binding. For calculation of linear coefficients, see the Methods. b, Position weight matrix (PWM) model considering both base identity and position in the binding site, but assuming independent contributions of each position. The plot shows the ranking of the bases according to their potential to promote (positive linear coefficient) or decrease (negative linear coefficient) functional C5 binding, relative to the reference sequence (AAAAAG, Fig. 1c). Bases are coloured as in a. For the calculation of linear coefficients, see the Methods.
- Extended Data Figure 8: Neural network analysis. (153 KB)
Correlation between observed krel and values calculated with the best model obtained by neural network analysis (Methods).
Extended Data Tables
- Supplementary Table 1 (310 KB)
This file contains the Read Number for each sequence variant at each timepoint. N/A indicates reads below quality threshold for a variant.