Hidden specificity in an apparently nonspecific RNA-binding protein

Journal name:
Nature
Volume:
502,
Pages:
385–388
Date published:
DOI:
doi:10.1038/nature12543
Received
Accepted
Published online

Nucleic-acid-binding proteins are generally viewed as either specific or nonspecific, depending on characteristics of their binding sites in DNA or RNA1, 2. Most studies have focused on specific proteins, which identify cognate sites by binding with highest affinities to regions with defined signatures in sequence, structure or both1, 2, 3, 4. Proteins that bind to sites devoid of defined sequence or structure signatures are considered nonspecific1, 2, 5. Substrate binding by these proteins is poorly understood, and it is not known to what extent seemingly nonspecific proteins discriminate between different binding sites, aside from those sequestered by nucleic acid structures6. Here we systematically examine substrate binding by the apparently nonspecific RNA-binding protein C5, and find clear discrimination between different binding site variants. C5 is the protein subunit of the transfer RNA processing ribonucleoprotein enzyme RNase P from Escherichia coli. The protein binds 5′ leaders of precursor tRNAs at a site without sequence or structure signatures. We measure functional binding of C5 to all possible sequence variants in its substrate binding site, using a high-throughput sequencing kinetics approach (HITS-KIN) that simultaneously follows processing of thousands of RNA species. C5 binds different substrate variants with affinities varying by orders of magnitude. The distribution of functional affinities of C5 for all substrate variants resembles affinity distributions of highly specific nucleic acid binding proteins. Unlike these specific proteins, C5 does not bind its physiological RNA targets with the highest affinity, but with affinities near the median of the distribution, a region that is not associated with a sequence signature. We delineate defined rules governing substrate recognition by C5, which reveal specificity that is hidden in cellular substrates for RNase P. Our findings suggest that apparently nonspecific and specific RNA-binding modes may not differ fundamentally, but represent distinct parts of common affinity distributions.

At a glance

Figures

  1. Processing of precursor tRNA with randomized leader sequences.
    Figure 1: Processing of precursor tRNA with randomized leader sequences.

    a, ptRNA processing reaction by RNase P. b, Structure of the RNase P holoenzyme9. c, Sequences of non-initiator ptRNAMet leaders (reference, black; randomized, red). The tRNA body is omitted for clarity. The arrow indicates the cleavage site. d, Time courses of RNase P processing of ptRNAMet82 (black) and ptRNAMet(-3-8N) (red), in the presence (filled circles), and in the absence (open circles) of C5. The solid lines are fits to the integrated rate equation for a biphasic first order reaction. e, Polyacrylamide gel electrophroresis (PAGE) of reactions processed for Illumina sequencing. f, Distributions of species for individual time points, ranked from fastest to slowest. The y axis marks the change in read numbers for each substrate species at the reaction time indicated, normalized to the number of reads at t = 0. Colours emphasize the different reaction times.

  2. Discrimination of C5 between different precursor tRNAMet leader sequences.
    Figure 2: Discrimination of C5 between different precursor tRNAMet leader sequences.

    a, Relative rate constants (krel) for processing of all ptRNA leader sequence variants, ranked from slow to fast. Relative rate constants are averaged from four values (two time points of two experiments) and shown for only sequences where data from all four measurements passed quality control criteria (Extended Data Table 1). The line at krel = 1 marks the reference sequence. b, Correlation of relative rate constants from two independent biological replicates (red line, linear fit through the data; R2, correlation coefficient). c, Correlation between relative rate constants obtained by PAGE and by the HITS-KIN approach for selected sequence variants. Error bars represent the s.d. of three or more individual measurements. d, Distribution of relative rate constants for processing of ptRNAMet(-3-8N) sequence variants by C5 (black) and apparent affinities for DNA binding by the transcription factor Arid3a, indicated as Z-scores based on published microarray data18. The Z-score is not identical to krel values, but accurately reflects affinity-based ranking of all sequences18 (triangles, krel values for genomic leader sequences of ptRNAMet). e, Plot of all sequence variants ranked from slowest to fastest processed. The bracket marks 0.3% of sequence variants with the largest relative rate constants. f, Sequence logo for this fraction.

  3. Rules for sequence discrimination by C5.
    Figure 3: Rules for sequence discrimination by C5.

    a, Correlation between observed krel and values calculated with the best fit of the data to models of increasing complexity. Logarithmic krel values are used because of their correspondence to differences in binding energies30. R2 expresses the correlation of each model with measured processing rate constants. b, Functional coupling between two base positions. Yellow squares show promotion of processing (high linear coefficients), black squares indicate small or no effects, blue squares mark inhibition of processing.

  4. C5 binding site in the 87 ptRNA leaders in E. coli.
    Extended Data Fig. 1: C5 binding site in the 87 ptRNA leaders in E. coli.

    ac, Alignment and sequence logos for the C5 binding site in all 87 ptRNA leaders encoded by E. coli. Binding of C5 to the consecutive ptRNA positions −3 to −8 is well established, based on a crystal structure9 and biochemical evidence10; that is, looping of bases seen for certain RNA- and DNA-binding proteins, does not occur with C5. Consistent with this idea, we did not detect any sequence motif with the MEME software, when including positions −1 to −10. a, Sequence alignment. Sequences were aligned with CLUSTAL. Coloured squares indicate the bases (C, blue; A, green; U, red; G, black). Anticodon, the anticodon recognized by the tRNA; tRNA#, the tRNA identification number; tRNA type, the amino acid. b, Sequence logo depicting the probability of any base at a given position, based on the alignment in a. The logo was generated with Weblogo. c, Sequence logo for the information content of the alignment in a. The logo was generated with Weblogo.

  5. Preparation of DNA libraries for Illumina sequencing.
    Extended Data Fig. 2: Preparation of DNA libraries for Illumina sequencing.

    a, BAR, the indexing barcode; NN, the degenerated barcode. For primer sequences see Methods. RT, reverse transcription. b, DNA libraries (PCR products, a) for samples at the time points indicated. Controls: lane 5, no RNA; lane 6, no reverse transcriptase. c, Read structure. Nucleotides 1 and 2 are degenerated barcode; nucleotides 3-5 are sample barcode (index tag); nucleotides 6–29 are additional leader sequence, nucleotides 30–35 are randomized leader sequence; nucleotides 38 onwards are tRNA.

  6. Multiple turnover reaction scheme.
    Extended Data Fig. 3: Multiple turnover reaction scheme.

    E, enzyme; ES1...i, individual enzyme substrate complexes; K1...i, individual functional binding constants; S1....i, individual substrate variants; V1...i, individual reaction rate constants.

  7. Effect of the 21 nucleotide extension on ptRNA processing by RNase P.
    Extended Data Fig. 4: Effect of the 21 nucleotide extension on ptRNA processing by RNase P.

    a, Relative processing rate constants were measured for three sequence variants from different parts of the affinity distribution by PAGE. Reactions for each sequence variant were conducted in the presence of the randomized population (unlabelled) with equal amounts of substrate with (S/21) and without the 21-nucleotide extension (S/nL). The asterisk marks the position of the radiolabel at the 5′ end of the substrate. Reactions were conducted under the conditions described in the Methods. b, PAGE for the reaction of the reference sequence variant. The time point at 5min is marked for reference. c, The effects of the 21-nucleotide extension on relative processing rate constants of the three indicated sequence variants. The position of each sequence variant in the affinity distribution of all sequence variants (Fig. 2d) is given for reference by the vertical line above the plot. The number indicates the factor (S/nL)/(S/21) by which the 21-nucleotide extension decreases the relative rate constant of the given sequence variant, given as average from three independent experiments. The horizontal line approximates the degree of the relative change. The 21-nucleotide extension decreases the observed for sequence variant (CTCCTG) by a factor of 2.3. For the genomically encoded leader sequence AAAAAG, the 21-nucleotide extension decreases krel for by a factor of 0.95; that is, the substrate with the extension reacts slightly faster than the substrate without extension. The fast reacting substrate (TTATAT) is also only minimally affected by the extension (0.92). Together, the data show only minor effects of the 21-nucleotide extension on the position of a given sequence variant in the affinity distribution.

  8. Processing of ptRNAMet(-3-8) by RNase P without C5.
    Extended Data Fig. 5: Processing of ptRNAMet(-3-8) by RNase P without C5.

    Distribution of krel values for processing of ptRNAMet(-3-8) by RNase P without C5 (black line). Data were obtained analogously to those with C5. For comparison, the distribution of krel values with C5 is shown (red line).

  9. Sequence logos are only associated with the high-affinity tail of the distribution.
    Extended Data Fig. 6: Sequence logos are only associated with the high-affinity tail of the distribution.

    a, Plot of sequence variants ranked from weakest to tightest binder to the specific transcription factor Arid3a (Fig. 2d), based on data published previously18. To facilitate direct comparison to the six-nucleotide binding site of C5, only approximately half of all sequences are shown in the plot, and only six positions (positions two to seven, as indicated) of the eight-nucleotide binding site are shown. The position in the binding site is marked on the right. The brackets mark 0.1% of sequence variants (33 sequences) that bind tightest, fall into the medium, and bind weakest. Sequence logos show the information content in these sequences. The logos were generated with Weblogo. Sequence signatures of the tightest binding variants are highly enriched in physiological substrates of Arid3a18. b, Plot of sequence variants ranked from weakest to tightest binder to another specific transcription factor, Hnf4a, based on data published previously18. Approximately half of all sequences are shown in the plot, and six positions (positions two to seven, as indicated) of the eight-nucleotide binding site. Sequence signatures of the tightest binding variants are highly enriched in physiological substrates of Hnf4A18. c, Plot of sequence variants ranked from slowest to fastest reacting for C5 (Fig. 2e). The brackets mark 1% of sequence variants that react fastest, fall into the medium and react slowest. Sequence logos were generated as in a.

  10. Sequence determinants for substrate recognition by C5.
    Extended Data Fig. 7: Sequence determinants for substrate recognition by C5.

    a, Model considering identity, but not position of a given base in the C5 binding site. Ranking of the four bases according to their potential to promote (positive linear coefficient) or decrease (negative linear coefficient) functional C5 binding. For calculation of linear coefficients, see the Methods. b, Position weight matrix (PWM) model considering both base identity and position in the binding site, but assuming independent contributions of each position. The plot shows the ranking of the bases according to their potential to promote (positive linear coefficient) or decrease (negative linear coefficient) functional C5 binding, relative to the reference sequence (AAAAAG, Fig. 1c). Bases are coloured as in a. For the calculation of linear coefficients, see the Methods.

  11. Neural network analysis.
    Extended Data Fig. 8: Neural network analysis.

    Correlation between observed krel and values calculated with the best model obtained by neural network analysis (Methods).

Tables

  1. Sequencing data.
    Extended Data Table 1: Sequencing data.

References

  1. Gupta, A. & Gribskov, M. The role of RNA sequence and structure in RNA–protein interactions. J. Mol. Biol. 409, 574587 (2011)
  2. von Hippel, P. H. & Berg, O. G. On the specificity of DNA–protein interactions. Proc. Natl Acad. Sci. USA 83, 16081612 (1986)
  3. Ray, D. et al. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nature Biotechnol. 27, 667670 (2009)
  4. Campbell, Z. T. et al. Cooperativity in RNA–protein interactions: global analysis of RNA binding specificity. Cell Rep. 1, 570581 (2012)
  5. Singh, R. & Valcárcel, J. Building specificity with nonspecific RNA-binding proteins. Nature Struct. Mol. Biol. 12, 645653 (2005)
  6. Zhuang, F., Fuchs, R. T., Sun, Z., Zheng, Y. & Robb, G. B. Structural bias in T4 RNA ligase-mediated 3′-adapter ligation. Nucleic Acids Res. 40, e54 (2012)
  7. Kurz, J. C. & Fierke, C. A. Ribonuclease P: a ribonucleoprotein enzyme. Curr. Opin. Chem. Biol. 4, 553558 (2000)
  8. Smith, J. K., Hsieh, J. & Fierke, C. A. Importance of RNA–protein interactions in bacterial ribonuclease P structure and catalysis. Biopolymers 87, 329338 (2007)
  9. Reiter, N. J. et al. Structure of a bacterial ribonuclease P holoenzyme in complex with tRNA. Nature 468, 784789 (2010)
  10. Rueda, D., Hsieh, J., Day-Storms, J. J., Fierke, C. A. & Walter, N. G. The 5′ leader of precursor tRNAAsp bound to the Bacillus subtilis RNase P holoenzyme has an extended conformation. Biochemistry 44, 1613016139 (2005)
  11. Herschlag, D. The role of induced fit and conformational changes of enzymes in specificity and catalysis. Bioorg. Chem. 16, 6296 (1988)
  12. Fersht, A. R. Enzyme Structure and Mechanism (Freeman, 1985)
  13. Cornish-Bowden, A. Enzyme specificity: its meaning in the general case. J. Theor. Biol. 108, 451457 (1984)
  14. Cleland, W. W. in Isotope Effects in Chemistry and Biology (eds Kohen, A. & Limbach, H. H.) 915930 (CRC Press, 2006)
  15. Schellenberger, V., Siegel, R. A. & Rutter, W. J. Analysis of enzyme specificity by multiple substrate kinetics. Biochemistry 32, 43444348 (1993)
  16. Lorenz, C. et al. Genomic SELEX for Hfq-binding RNAs identifies genomic aptamers predominantly in antisense transcripts. Nucleic Acids Res. 38, 37943808 (2010)
  17. Pitt, J. N. & Ferré-D'Amaré, A. R. Rapid construction of empirical RNA fitness landscapes. Science 330, 376379 (2010)
  18. Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 17201723 (2009)
  19. Rowe, W. et al. Analysis of a complete DNA–protein affinity landscape. J. R. Soc. Interface 7, 397408 (2010)
  20. Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nature Biotechnol. 29, 659664 (2011)
  21. Stormo, G. D. & Zhao, Y. Determining the specificity of protein–DNA interactions. Nature Rev. Genet. 11, 751760 (2010)
  22. SantaLucia J. Jr & Turner, D. H. Measuring the thermodynamics of RNA secondary structure formation. Biopolymers 44, 309319 (1997)
  23. Forsdyke, D. R. Calculation of folding energies of single-stranded nucleic acid sequences: conceptual issues. J. Theor. Biol. 248, 745753 (2007)
  24. Maerkl, S. J. & Quake, S. R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233237 (2007)
  25. Zhao, Y. & Stormo, G. Quantitative analysis demonstrates most transcription factors require only simple models of specificity. Nature Biotechnol. 29, 480483 (2011)
  26. Sun, L., Campbell, F. E., Yandek, L. E. & Harris, M. E. Binding of C5 protein to P RNA enhances the rate constant for catalysis for P RNA processing of pre-tRNAs lacking a consensus (+ 1)/C(+ 72) pair. J. Mol. Biol. 395, 10191037 (2010)
  27. Leontis, N. B., Lescoute, A. & Westhof, E. The building blocks and motifs of RNA architecture. Curr. Opin. Struct. Biol. 16, 279287 (2006)
  28. Snoussi, K. & Leroy, J. L. Imino proton exchange and base-pair kinetics in RNA duplexes. Biochemistry 40, 88988904 (2001)
  29. LaRiviere, F. J., Wolfson, A. D. & Uhlenbeck, O. C. Uniform binding of aminoacyl-tRNAs to elongation factor Tu by thermodynamic compensation. Science 294, 165168 (2001)
  30. Stormo, G. D., Schneider, T. D. & Gold, L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res. 14, 66616679 (1986)
  31. Guo, X. et al. RNA-dependent folding and stabilization of C5 protein during assembly of the E. coli RNase P holoenzyme. J. Mol. Biol. 360, 190203 (2006)
  32. Christian, E. L., McPheeters, D. S. & Harris, M. E. Identification of individual nucleotides in the bacterial ribonuclease P ribozyme adjacent to the pre-tRNA cleavage site by short-range photo-cross-linking. Biochemistry 37, 1761817628 (1998)
  33. Cha, S. Kinetics of enzyme reactions with competing alternative substrates. Mol. Pharmacol. 4, 621629 (1968)
  34. Chan, P. P. & Lowe, T. M. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 37, D93D97 (2009)
  35. Northrop, D. B. Fitting enzyme-kinetic data to V/K. Anal. Biochem. 132, 457461 (1983)
  36. Northrop, D. B. Rethinking fundamentals of enzyme action. Adv. Enzymol. 73, 2555 (1999)
  37. Theil, H. Economic Forecasts and Policy (North Holland Publishing, 1961)
  38. Bendel, R. B. & Afifi, A. A. Comparison of stopping rules in forward “stepwise” regression. J. Am. Stat. Assoc. 72, 4653 (1977)

Download references

Author information

Affiliations

  1. Center for RNA Molecular Biology, Case Western Reserve University, Cleveland, Ohio 44106, USA

    • Ulf-Peter Guenther,
    • Frank E. Campbell &
    • Eckhard Jankowsky
  2. Department of Biochemistry, School of Medicine, Case Western Reserve University, Cleveland, Ohio 44106, USA

    • Ulf-Peter Guenther,
    • Lindsay E. Yandek,
    • Courtney N. Niland,
    • Vernon E. Anderson,
    • Michael E. Harris &
    • Eckhard Jankowsky
  3. Department of Management, Zicklin School of Business, Baruch College, The City University of New York, New York 10010, USA

    • David Anderson

Contributions

U.-P.G., M.E.H. and E.J. designed the study. U.-P.G., L.E.Y., C.N.N. and F.E.C. performed the experiments. V.E.A. contributed to the development of the data analysis framework. D.A. developed and performed the modelling for binding models. U.-P.G., D.A., M.E.H. and E.J. analysed the data. U.P.G., M.E.H. and E.J. wrote the paper.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: C5 binding site in the 87 ptRNA leaders in E. coli. (414 KB)

    ac, Alignment and sequence logos for the C5 binding site in all 87 ptRNA leaders encoded by E. coli. Binding of C5 to the consecutive ptRNA positions −3 to −8 is well established, based on a crystal structure9 and biochemical evidence10; that is, looping of bases seen for certain RNA- and DNA-binding proteins, does not occur with C5. Consistent with this idea, we did not detect any sequence motif with the MEME software, when including positions −1 to −10. a, Sequence alignment. Sequences were aligned with CLUSTAL. Coloured squares indicate the bases (C, blue; A, green; U, red; G, black). Anticodon, the anticodon recognized by the tRNA; tRNA#, the tRNA identification number; tRNA type, the amino acid. b, Sequence logo depicting the probability of any base at a given position, based on the alignment in a. The logo was generated with Weblogo. c, Sequence logo for the information content of the alignment in a. The logo was generated with Weblogo.

  2. Extended Data Figure 2: Preparation of DNA libraries for Illumina sequencing. (168 KB)

    a, BAR, the indexing barcode; NN, the degenerated barcode. For primer sequences see Methods. RT, reverse transcription. b, DNA libraries (PCR products, a) for samples at the time points indicated. Controls: lane 5, no RNA; lane 6, no reverse transcriptase. c, Read structure. Nucleotides 1 and 2 are degenerated barcode; nucleotides 3-5 are sample barcode (index tag); nucleotides 6–29 are additional leader sequence, nucleotides 30–35 are randomized leader sequence; nucleotides 38 onwards are tRNA.

  3. Extended Data Figure 3: Multiple turnover reaction scheme. (50 KB)

    E, enzyme; ES1...i, individual enzyme substrate complexes; K1...i, individual functional binding constants; S1....i, individual substrate variants; V1...i, individual reaction rate constants.

  4. Extended Data Figure 4: Effect of the 21 nucleotide extension on ptRNA processing by RNase P. (311 KB)

    a, Relative processing rate constants were measured for three sequence variants from different parts of the affinity distribution by PAGE. Reactions for each sequence variant were conducted in the presence of the randomized population (unlabelled) with equal amounts of substrate with (S/21) and without the 21-nucleotide extension (S/nL). The asterisk marks the position of the radiolabel at the 5′ end of the substrate. Reactions were conducted under the conditions described in the Methods. b, PAGE for the reaction of the reference sequence variant. The time point at 5min is marked for reference. c, The effects of the 21-nucleotide extension on relative processing rate constants of the three indicated sequence variants. The position of each sequence variant in the affinity distribution of all sequence variants (Fig. 2d) is given for reference by the vertical line above the plot. The number indicates the factor (S/nL)/(S/21) by which the 21-nucleotide extension decreases the relative rate constant of the given sequence variant, given as average from three independent experiments. The horizontal line approximates the degree of the relative change. The 21-nucleotide extension decreases the observed for sequence variant (CTCCTG) by a factor of 2.3. For the genomically encoded leader sequence AAAAAG, the 21-nucleotide extension decreases krel for by a factor of 0.95; that is, the substrate with the extension reacts slightly faster than the substrate without extension. The fast reacting substrate (TTATAT) is also only minimally affected by the extension (0.92). Together, the data show only minor effects of the 21-nucleotide extension on the position of a given sequence variant in the affinity distribution.

  5. Extended Data Figure 5: Processing of ptRNAMet(-3-8) by RNase P without C5. (115 KB)

    Distribution of krel values for processing of ptRNAMet(-3-8) by RNase P without C5 (black line). Data were obtained analogously to those with C5. For comparison, the distribution of krel values with C5 is shown (red line).

  6. Extended Data Figure 6: Sequence logos are only associated with the high-affinity tail of the distribution. (636 KB)

    a, Plot of sequence variants ranked from weakest to tightest binder to the specific transcription factor Arid3a (Fig. 2d), based on data published previously18. To facilitate direct comparison to the six-nucleotide binding site of C5, only approximately half of all sequences are shown in the plot, and only six positions (positions two to seven, as indicated) of the eight-nucleotide binding site are shown. The position in the binding site is marked on the right. The brackets mark 0.1% of sequence variants (33 sequences) that bind tightest, fall into the medium, and bind weakest. Sequence logos show the information content in these sequences. The logos were generated with Weblogo. Sequence signatures of the tightest binding variants are highly enriched in physiological substrates of Arid3a18. b, Plot of sequence variants ranked from weakest to tightest binder to another specific transcription factor, Hnf4a, based on data published previously18. Approximately half of all sequences are shown in the plot, and six positions (positions two to seven, as indicated) of the eight-nucleotide binding site. Sequence signatures of the tightest binding variants are highly enriched in physiological substrates of Hnf4A18. c, Plot of sequence variants ranked from slowest to fastest reacting for C5 (Fig. 2e). The brackets mark 1% of sequence variants that react fastest, fall into the medium and react slowest. Sequence logos were generated as in a.

  7. Extended Data Figure 7: Sequence determinants for substrate recognition by C5. (193 KB)

    a, Model considering identity, but not position of a given base in the C5 binding site. Ranking of the four bases according to their potential to promote (positive linear coefficient) or decrease (negative linear coefficient) functional C5 binding. For calculation of linear coefficients, see the Methods. b, Position weight matrix (PWM) model considering both base identity and position in the binding site, but assuming independent contributions of each position. The plot shows the ranking of the bases according to their potential to promote (positive linear coefficient) or decrease (negative linear coefficient) functional C5 binding, relative to the reference sequence (AAAAAG, Fig. 1c). Bases are coloured as in a. For the calculation of linear coefficients, see the Methods.

  8. Extended Data Figure 8: Neural network analysis. (153 KB)

    Correlation between observed krel and values calculated with the best model obtained by neural network analysis (Methods).

Extended Data Tables

  1. Extended Data Table 1: Sequencing data. (160 KB)

Supplementary information

Excel files

  1. Supplementary Table 1 (310 KB)

    This file contains the Read Number for each sequence variant at each timepoint. N/A indicates reads below quality threshold for a variant.

Additional data