Gene expression in higher eukaryotic cells orchestrates interactions between thousands of RNA-binding proteins (RBPs) and tens of thousands of RNAs1. The kinetics by which RBPs bind to and dissociate from their RNA sites are critical for the coordination of cellular RNA–protein interactions2. However, these kinetic parameters have not been experimentally measured in cells. Here we show that time-resolved RNA–protein cross-linking with a pulsed femtosecond ultraviolet laser, followed by immunoprecipitation and high-throughput sequencing, allows the determination of binding and dissociation kinetics of the RBP DAZL for thousands of individual RNA-binding sites in cells. This kinetic cross-linking and immunoprecipitation (KIN-CLIP) approach reveals that DAZL resides at individual binding sites for time periods of only seconds or shorter, whereas the binding sites remain DAZL-free for markedly longer. The data also indicate that DAZL binds to many RNAs in clusters of multiple proximal sites. The effect of DAZL on mRNA levels and ribosome association correlates with the cumulative probability of DAZL binding in these clusters. Integrating kinetic data with mRNA features quantitatively connects DAZL–RNA binding to DAZL function. Our results show how kinetic parameters for RNA–protein interactions can be measured in cells, and how these data link RBP–RNA binding to the cellular function of RBPs.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Customized R and Python scripts are available at https://github.com/deebratforlife/KIN-CLIP.
Gerstberger, S., Hafner, M. & Tuschl, T. A census of human RNA-binding proteins. Nat. Rev. Genet. 15, 829–845 (2014).
Licatalosi, D. D., Ye, X. & Jankowsky, E. Approaches for measuring the dynamics of RNA-protein interactions. Wiley Interdiscip. Rev. RNA 11, e1565 (2020).
Corley, M., Burns, M. C. & Yeo, G. W. How RNA-binding proteins interact with RNA: molecules and mechanisms. Mol. Cell 78, 9–29 (2020).
Ule, J., Hwang, H. W. & Darnell, R. B. The future of cross-linking and immunoprecipitation (CLIP). Cold Spring Harb. Perspect. Biol. 10, a032243 (2018).
Van Nostrand, E. L. et al. Principles of RNA processing from analysis of enhanced CLIP maps for 150 RNA binding proteins. Genome Biol. 21, 90 (2020).
Gleitsman, K. R., Sengupta, R. N. & Herschlag, D. Slow molecular recognition by RNA. RNA 23, 1745–1753 (2017).
Jarmoskaite, I. et al. A quantitative and predictive model for RNA binding by human Pumilio proteins. Mol. Cell 74, 966–981 (2019).
Sutandy, F. X. R. et al. In vitro iCLIP-based modeling uncovers how the splicing factor U2AF2 relies on regulation by cofactors. Genome Res. 28, 699–713 (2018).
Hockensmith, J. W., Kubasek, W. L., Vorachek, W. R. & von Hippel, P. H. Laser cross-linking of nucleic acids to proteins. Methodology and first applications to the phage T4 DNA replication system. J. Biol. Chem. 261, 3512–3518 (1986).
Pashev, I. G., Dimitrov, S. I. & Angelov, D. Crosslinking proteins to nucleic acids by ultraviolet laser irradiation. Trends Biochem. Sci. 16, 323–326 (1991).
Russmann, C. et al. Crosslinking of progesterone receptor to DNA using tuneable nanosecond, picosecond and femtosecond UV laser pulses. Nucleic Acids Res. 25, 2478–2484 (1997).
Steube, A., Schenk, T., Tretyakov, A. & Saluz, H. P. High-intensity UV laser ChIP–seq for the study of protein–DNA interactions in living cells. Nat. Commun. 8, 1303 (2017).
Budowsky, E. I., Axentyeva, M. S., Abdurashidova, G. G., Simukova, N. A. & Rubin, L. B. Induction of polynucleotide-protein cross-linkages by ultraviolet irradiation. Peculiarities of the high-intensity laser pulse irradiation. Eur. J. Biochem. 159, 95–101 (1986).
Auweter, S. D. et al. Molecular basis of RNA recognition by the human alternative splicing factor Fox-1. EMBO J. 25, 163–173 (2006).
Chen, Y. et al. Targeted inhibition of oncogenic miR-21 maturation with designed RNA-binding proteins. Nat. Chem. Biol. 12, 717–723 (2016).
Jenkins, H. T., Malkova, B. & Edwards, T. A. Kinked β-strands mediate high-affinity recognition of mRNA targets by the germ-cell regulator DAZL. Proc. Natl Acad. Sci. USA 108, 18266–18271 (2011).
Zagore, L. L. et al. DAZL regulates germ cell survival through a network of polyA-proximal mRNA interactions. Cell Rep. 25, 1225–1240 (2018).
Hofmann, M. C., Narisawa, S., Hess, R. A. & Millán, J. L. Immortalization of germ cells and somatic testicular cells using the SV40 large T antigen. Exp. Cell Res. 201, 417–435 (1992).
Fu, X. F. et al. DAZ family proteins, key players for germ cell development. Int. J. Biol. Sci. 11, 1226–1235 (2015).
Lin, Y. & Page, D. C. Dazl deficiency leads to embryonic arrest of germ cell development in XY C57BL/6 mice. Dev. Biol. 288, 309–316 (2005).
Ruggiu, M. et al. The mouse Dazla gene encodes a cytoplasmic protein essential for gametogenesis. Nature 389, 73–77 (1997).
Saunders, P. T. et al. Absence of mDazl produces a final block on germ cell development at meiosis. Reproduction 126, 589–597 (2003).
Yang, C. R. et al. The RNA-binding protein DAZL functions as repressor and activator of mRNA translation during oocyte maturation. Nat. Commun. 11, 1399 (2020).
Haberman, N. et al. Insights into the design and interpretation of iCLIP experiments. Genome Biol. 18, 7 (2017).
Huppertz, I. et al. iCLIP: protein–RNA interactions at nucleotide resolution. Methods 65, 274–287 (2014).
Reynolds, N. et al. Dazl binds in vivo to specific transcripts and can regulate the pre-meiotic translation of Mvh in germ cells. Hum. Mol. Genet. 14, 3899–3909 (2005).
Itri, F. et al. Femtosecond UV-laser pulses to unveil protein-protein interactions in living cells. Cell. Mol. Life Sci. 73, 637–648 (2016).
Brister, M. M. & Crespo-Hernández, C. E. Direct observation of triplet-state population dynamics in the RNA uracil derivative 1-cyclohexyluracil. J. Phys. Chem. Lett. 6, 4404–4409 (2015).
Brister, M. M. & Crespo-Hernández, C. E. Excited-state dynamics in the RNA nucleotide uridine 5′-monophosphate investigated using femtosecond broadband transient absorption spectroscopy. J. Phys. Chem. Lett. 10, 2156–2161 (2019).
Paschotta, R. Encyclopedia of Laser Physics and Technology (Wiley-VCH, 2008).
Strober, W. Trypan blue exclusion test of cell viability. Curr. Protoc. Immunol. https://doi.org/10.1002/0471142735.ima03bs21 (2001).
Moore, M. J. et al. Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis. Nat. Protocols 9, 263–293 (2014).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Weyn-Vanhentenryck, S. M. et al. HITS-CLIP and integrative modeling define the Rbfox splicing-regulatory network linked to brain development and autism. Cell Rep. 6, 1139–1152 (2014).
Zhang, C. & Darnell, R. B. Mapping in vivo protein–RNA interactions at single-nucleotide resolution from HITS-CLIP data. Nat. Biotechnol. 29, 607–614 (2011).
Schindler, D., Uschner, D., Hilgers, R.-D. & Heussen, N. randomizeR: randomization for clinical trials. R version 4.3.0 https://cran.r-project.org/web/packages/randomizeR/index.html (2019).
Aken, B. L. et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Fox, J. An R Companion to Applied Regression 3rd edition (Sage, 2019).
Thompson, H. W., Mera, R. & Prasad, C. The analysis of variance (ANOVA). Nutr. Neurosci. 2, 43–55 (1999).
Leschinski, C. Vignette: the MonteCarlo package. R version 4.3.0 https://cran.r-project.org/web/packages/MonteCarlo/vignettes/MonteCarlo-Vignette.html (2019).
Cao, J. & Zhang, S. A Bayesian extension of the hypergeometric test for functional enrichment analysis. Biometrics 70, 84–94 (2014).
Jolliffe, I. T. & Cadima, J. Principal component analysis: a review and recent developments. Phil. Trans. R. Soc. Lond. A 374, 20150202 (2016).
Kerr, G., Ruskin, H. J., Crane, M. & Doolan, P. Techniques for clustering gene expression data. Comput. Biol. Med. 38, 283–293 (2008).
Krijthe, J. H. Rtsne: t-distributed stochastic neighbour embedding using a Barnes–Hut implementation. https://github.com/jkrijthe/Rtsne (2015).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Bigs, D., De Ville, B. & Suen, E. A method of choosing multiway partitions for classification and decision trees. J. Appl. Stat. 18, 49–62 (1991).
Goodman, L. A. Simple models for the analysis of association in crossclassifications having ordered categories. J. Am. Stat. Assoc. 74, 537–552 (1979).
Armstrong, R. A. When to use the Bonferroni correction. Ophthalmic Physiol. Opt. 34, 502–508 (2014).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Fabregat, A. et al. Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinformatics 18, 142 (2017).
Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 48, D498–D503 (2020).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Magidson, J. Common pitfalls in causal analysis of categorical data. J. Mark. Res. 19, 461–471 (1982).
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. Classification and Regression Trees (Chapman & Hall/CRC, 1984).
Dua, D. & Gradd, C. UCI Machine Learning Repository http://archive.ics.uci.edu/ml (University of California, School of Information and Computer Science, 2019).
Kass, G. V. An exploratory technique for investigating large quantities for categorical data. Appl. Stat. 29, 119–127 (1980).
We thank G. Varani for providing purified RBFOX(RRM) and RBFOXmut(RRM); A. Komar for the design of the codon-optimized DAZL construct; and W. Huang for assistance with the fluorescence polarization experiments. This work was supported by the NIH (GM118088 to E.J. and GM107331 to D.D.L.) and the NSF (CHE-1800052 to C.E.C.-H.).
D.S. and E.J. are founders of Bainom Inc. The remaining authors declare no competing interests.
Peer review information Nature thanks Rick Russell and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, Schematics of set-up of the fs laser. b, Degradation of RNA (38 nt) under steady-state and fs-laser illumination. Data points represent averages of three independent measurements. Error bars indicate 1 s.d. Lines show a linear trend. c, Dose absorbed over time for cross-linking with conventional UV (Stratalinker, 200 mJ cm−2, λ = 254 nm) and fs laser (2.6 mW). Error bars indicate 1 s.d. (n = 3 independent measurements). Lines show a linear trend. d, Representative denaturing PAGE for a cross-linking reaction of 50 nM RBFOX(RRM) (laser: 2.6 mW) (lanes 5–12) and control reactions with RNA only (lanes 1–3) and RBFOX(RRM) only (lane 4), with (lanes 2–4) or without (lanes 1 and 5) cross-linking. Three independent measurements provided similar results. e, Representative denaturing PAGE for a cross-linking reaction of 50 nM RBFOX(RRM) with Stratalinker (200 mJ cm−2, λ = 254 nm), lanes 4–8) and control reactions (lanes 1–3). Three independent measurements provided similar results. f, Time course of cross-linking reaction of 50 nM RBFOX(RRM) with Stratalinker (200 mJ cm−2, λ = 254 nm) versus fs laser (Fig. 1d). Data points are averages from triplicate experiments (error bars indicate 1 s.d.). g, h, RNA cross-linking time courses for DAZL(RRM) (g) and RBFOXmut(RRM) (h) with fs laser at different laser power and protein concentrations. Data points represent averages of three independent measurements (error bars indicate 1 s.d.). Lines show the fit to the data in Fig. 1e. i–k, Binding isotherms for RBFOX(RRM) (i), RBFOXmut(RRM) (j) and DAZL(RRM) (k) to cognate RNAs, measured by fluorescence anisotropy. Experiments were performed multiple times; all data points are shown. Apparent equilibrium binding constants (K1/2; Fig. 1e) were calculated with the quadratic binding equation. Source data
a, Western blot of doxycyline-dependent DAZL expression in GC-1 cells. Four independent experiments provided similar results. b, Schematic of the time-resolved cross-linking approach in cells. Numbers mark the respective CLIP libraries. c, Representative PAGE for bulk DAZL–RNA cross-linking. Three independent experiments provided similar results. The intensity of cross-linked RNA (marked) is used to convert NGS reads to a concentration-equivalent parameter (for bulk cross-linking intensities and associated standard errors see Supplementary Table 6). d, Distribution of CLIP sequencing reads across RNA classes and mRNA regions for fs laser (4.2× DAZL, 2.6 mW) and conventional cross-linking (Stratalinker; 4.2× DAZL). Distributions for laser cross-linking experiments were calculated for binding sites with sequencing reads for all 12 measurements. Distributions for iCLIP experiments were calculated from three independent measurements17. e, DAZL-binding sites identified by fs laser (KIN-CLIP) and conventional UV cross-linking (iCLIP) on all RNAs and 3′-UTRs. f, Metagene distribution of DAZL-binding sites identified by KIN-CLIP and iCLIP on 3′-UTRs proximal to stop codons and the PAS. The dotted lines mark the background of a random distribution of binding sites on 3′-UTRs. g, CITS analysis36,37 of 6-mer and 4-mer enrichment at 5′-termini of sequencing reads for KIN-CLIP (top) and iCLIP (bottom). The data indicate a virtually identical sequence context of cross-linking sites for KIN-CLIP and iCLIP. Sequence enrichment reflects the statistical overrepresentation of 6-mer and 4-mer sequences with respect to randomized sequences (z-score, 11-nt region, ±5 nt from the 5′-terminal nucleotide). Source data
Extended Data Fig. 3 Determination of kinetic parameters from fs-laser time-resolved DAZL–RNA cross-linking in cells.
a, Flow chart of the approach to calculate kinetic parameters for individual DAZL–RNA binding sites in cells (for details, see Methods). Unless otherwise stated, rate constants averaged from both approaches are used in subsequent data analyses. b, Scaling of Χ2 with the number of iterative fitting cycles for analytical and numerical approaches. c, d, Distribution of Χ2 at first and last (642) fitting cycle for analytical (c) and numerical (d) approaches (COD, coefficient of determination; R2, linear correlation coefficient). e–i, Correlation of parameters calculated with analytical and numerical fitting procedures for kon(4.2× DAZL) (e), kon(1× DAZL) (f), kdiss (g), kXL(2.6 mW) (h) and kXL(1 mW) (i). j, Correlation between cross-linking rate constants for low and high laser power. Rate constants are averaged from parameters obtained with numerical and analytical approaches. Cross-linking rate constants at higher laser power were larger than at lower laser power for 92% of binding sites. k, Confidence range for dissociation rate constants (for details, see Methods). l, Normalized read densities measured experimentally and calculated from the kinetic parameters for all DAZL-binding sites. m, Distribution of Χ2 for experimental values compared with values calculated with the kinetic parameters. Source data
a–d, Sequences surrounding DAZL-binding sites, arranged according to decreasing values for kon(4.2× DAZL) (a), kdiss (b), kXL(2.6 mW) (c) and Φmax (d). Sequences are aligned at the peak nucleotide (most frequent cross-link site (±11-nt peak nucleotide); Extended Data Fig. 2f, position 0). e–h, Frequency of 6-mer sequences surrounding DAZL cross-link sites (±111-nt peak nucleotide) in the top and bottom 5% of sequences arranged according to the kinetic parameters in a–d. i–l, Relative frequency of 6-mer sequences in the top and bottom 5% of sequences (e–h), arranged according to the kinetic parameters in a–d. Sequences below the diagonal line correspond to enrichment of a 6-mer in the top 5% versus the bottom 5%. A6, U6 and U3GU2 are most enriched in the vicinity of the binding sites with the fastest apparent association rate constants, compared to the binding sites with the slowest apparent association rate constants. No comparable enrichment is seen for other kinetic parameters. m–p, Relative frequency of 4-mers in the top and bottom 5% of sequences arranged according to the kinetic parameters in a–d. q–t, Distribution of association and dissociation rate constants, binding probabilities (P(4.2× DAZL)) and maximal fractional occupancy (Φmax) for binding sites (n = 8,696, binding sites with associated values for fractional occupancy) on different RNA classes. P values (one-way ANOVA, significant for P < 0.05) indicate inter-group differences. Φmax values, but not other parameters, vary significantly for different RNA classes (for box plots: vertical line, median; box limits, IQR; whiskers, 1.5 × IQR). u–x, Distributions of kinetic parameters for all binding sites (n = 8,212, binding sites with associated values for fractional occupancy) in the indicated mRNA regions (P value by one-way ANOVA; for box plots: vertical line, median; box limits, IQR; whiskers, 1.5 × IQR). kon(4.2× DAZL) and P(4.2× DAZL), but not the other parameters, vary significantly for different mRNA regions. Source data
a, Arrangement of DAZL-binding sites in 3′-UTRs. Binding sites are coloured according to kon(4.2× DAZL) and kdiss as indicated in the key. Right, number of clusters in the corresponding 3′-UTR. Colours mark the number of binding sites in a cluster, as indicated in the legend bar (right) (n = 1,313 3′-UTRs, 1,690 clusters, 6,085 binding sites). b, c, Distribution of DAZL-binding sites in 3′-UTRs closer than 500 nt to the PAS (b) or further than 500 nt from the PAS (c) as a function of the distance between neighbouring binding sites. The grey lines show the distribution if sites were randomly distributed across all 3′-UTRs (P values by one sided t-test). d, Large windows: genome browser traces of representative 3′-UTRs with five clusters (Nucks1) and 2 clusters (D’Rik, D030056L22Rik). Bars show the normalized read coverage for 4.2× DAZL, 2.6 mW laser and 680 s cross-linking time. Numbers mark the distance between clusters. Small windows: magnification of cluster 1 of Nucks1, with three binding sites, and cluster 1 of D’Rik, with two binding sites (numbers mark the distance between binding sites). e, Number of clusters in 3′-UTRs with DAZL-binding sites. Colours show the number of binding sites in a cluster as indicated in a (red, 20; light yellow, 1). f, Distances between clusters in 3′-UTRs with two to four clusters. Number 1 represents the cluster most proximal to the PAS. g, Distribution of distances between neighbouring binding sites (n = 2,888) in clusters (2–9 binding sites). Number 1 represents the 3′ binding site (for box plots: vertical line, median; box limits, IQR; whiskers, 1.5 × IQR). h–j, Correlation between the number of binding sites (n = 6,546) for clusters proximal (blue, <0.5 kb) and distant (red, ≥0.5 kb) to the PAS and P(4.2× DAZL) (h), dissociation rate constants (kdiss; i), and maximum fractional occupancy (Φmax; j), for individual binding sites in a given cluster. P values (one-way ANOVA) indicate significant inter-group differences for P(4.2× DAZL) and Φmax, but not for kdiss. P(4.2× DAZL) and Φmax depend on kon(4.2× DAZL), which correlates with the number of binding sites in a cluster (Fig. 3c). For box plots: vertical line, median; box limits, IQR; whiskers, 1.5 × IQR. k, Correlation between kinetic parameters of individual binding sites in clusters with 6, 5, 4 and 3 binding sites. The Pearson correlation coefficient is indicated. Binding site number 1 indicates the 3′ binding site in a cluster. Source data
Extended Data Fig. 6 Link between DAZL binding in 3′-UTRs and effects on mRNA level and ribosome association.
a, Correlation between cumulative binding probabilities (ΣB) and number of binding sites in a cluster (n = 1,313 3′-UTRs, 6,085 binding sites, 1,690 clusters in transcripts with associated values for ΔRNA and ΔRPF). b, Correlation between ΣB and distance of the cluster from the PAS. c, d, Correlation of ΣB terciles (Fig. 4a) and changes in ribosome association (ΔRPF; Fig. 4b) (c) or changes in transcript levels (ΔRNA; Fig. 4b) (d) for the corresponding transcripts (n = 968) between low (1× DAZL) and high (4.2× DAZL) concentration (P values by one-way ANOVA). For UTRs with multiple clusters, the cluster closest to the PAS was used (for box plots: vertical line, median; box limits, IQR; whiskers, 1.5 × IQR). e, Distribution of binding probabilities for individual DAZL-binding sites in 3′-UTRs for transcripts in THRH, THRM, TLRM, TLRL, TMRH, TMRL mRNA classes (Fig. 4b). The dotted lines mark terciles (for details, see Methods). f, Correlation between binding probabilities for individual binding sites and functional mRNA classes (Fig. 4b). Colours mark the enrichment of a given ΣB tercile compared to a random distribution (one-sided hypergeometric test; red, P < 0.0005 to 0.05; shades of yellow, P > 0.05 to 0.5 (not enriched)). No significant enrichment is observed. g, Distribution of cumulative binding probabilities for DAZL clusters in 3′-UTRs with scrambled binding sites. The dotted lines mark terciles. h, Correlation between cumulative binding probabilities of DAZL clusters with binding sites scrambled between clusters (g) and functional mRNA classes (Fig. 4b). Colours mark the enrichment of a given ΣB tercile compared to a random distribution (one-sided hypergeometric test; red, P < 0.0005 to 0.05; shades of yellow, P > 0.05 to 0.5 (not enriched)). No significant enrichment is observed. i, Correlation between additive binding probabilities of two DAZL sites in a cluster and functional mRNA classes. Colours mark the enrichment of a given ΣB tercile compared to a random distribution (one-sided hypergeometric test; red, P < 0.0005 to 0.05; shades of yellow, P > 0.05 to 0.5 (not enriched)). For clusters with more than two binding sites, permutations of two sites were tested and sites with the highest additive binding probability were selected. The model tests whether the additive binding probability of any two DAZL-binding sites in a given cluster can explain the effect of DAZL on the transcript to the same extent as considering cumulative binding probabilities for the entire cluster (Fig. 4c). The model is only able to explain the TLRL, TLRM mRNA classes, which frequently contain transcripts with clusters that have only few DAZL-binding sites. j, Correlation between conditional binding probabilities of two DAZL sites in a cluster (terciles) and functional mRNA classes. Colours mark the enrichment of a given ΣB tercile compared to a random distribution (one-sided hypergeometric test; red, P < 0.0005 to 0.05; shades of yellow, P > 0.05 to 0.5 (not enriched)). For clusters with more than two binding sites, permutations of two sites were tested and combinations of sites with the highest multiplicative binding probability were selected. The model tests whether the conditional binding probability of any two DAZL-binding sites (for example, whether DAZL needs to bind simultaneously to both sites) in a given cluster can explain the effect of DAZL on the transcript to the same extent as considering cumulative binding probabilities for the entire cluster (Fig. 4c). The model explains only mRNA classes that frequently contain transcripts with DAZL clusters that have only few binding sites. For these clusters cumulative and conditional binding probabilities scale similarly. The data suggest that simultaneous binding of DAZL to two sites in a cluster is not required for general DAZL function. k, Correlation between conditional binding probabilities of three DAZL sites in a cluster (terciles) and functional mRNA classes. Colours mark the enrichment of a given ΣB tercile compared to a random distribution (hypergeometric test, one-sided, red: P < 0.0005 to 0.05, shades of yellow: P > 0.05 to 0.5, not enriched). Analysis was performed as in j (Fig. 4c). The data suggest that simultaneous binding of DAZL to two or more sites in a cluster is not required for DAZL function. Source data
Extended Data Fig. 7 Link between DAZL clusters in 3′-UTRs and effects on mRNA level and ribosome association.
a, Distribution of transcript levels at 4.2× DAZL b, Distribution of 3′-UTR lengths17,39,40. For UTRs with multiple lengths, coordinates for the longest 3′-UTR were used. c, Distribution of distances of DAZL clusters from the PAS. d, Distribution of differential cumulative binding probability (ΔΣB) for all DAZL clusters. The dotted lines mark terciles. Terciles were defined by obtained standard deviations from the mean for each feature described above. e, Link between the effect of DAZL on mRNA level and ribosome association and cluster features (top graphs: black line, number of DAZL clusters in 3′-UTR; blue vertical lines, ΣB, with the lower end marking ΣB at 1× DAZL and the upper end marking ΣB at 4.2× DAZL; middle graphs: ΔΣB for each cluster and number of DAZL-binding sites in each cluster; heat maps below the graphs: terciles of transcript features obtained from a–c. Each panel shows one functional mRNA class (defined in Fig. 4b). Functional classes not displayed contained too few or no transcripts (TLRH, 0; THRL, 2) or showed no change in ribosome association and transcript level (TMRM). Numbers represent the groups in the DAZL code (Fig. 4d). Clusters with ΣB > 1 (n = 4) are not shown. Source data
a, Pairwise correlation between DAZL cluster features. Colours correspond to Pearson’s correlation coefficient. Cluster features are marked as indicated on the right. b, Variance of data reflected in the eigenvalues of the seven principal component axes obtained by PCA. Each eigenvalue corresponds to a principal component axis. Each axis reflects a linear combination of the seven characteristics of a DAZL cluster, obtained from a. The eigenvalues and the corresponding principal component axis are sorted according to the initial variance they represent. The first three principal component axes explain roughly 90% variance. c, Biplots of DAZL cluster features (arrows) projected on the first two principal components (PC1, PC2; b). Dots represent transcripts. Colours correspond to terciles of the distributions of values for ΔRPF and ΔRNA as defined in Fig 4b. Each arrow represents a cluster feature (labels as in a). Proximity of arrows scales with correlation between the corresponding features. Arrows in the x direction (positive or negative) contribute to PC1; arrows in the y direction (positive or negative) contribute to PC2. Short arrows (transcript level, proximity to PAS) indicate that additional principal components (PC3–PC7) are required to explain the corresponding feature. d, t-SNE (perplexity = 10, iterations = 2,000) of cluster features (a). Identified groups are marked 1–21. Each point represents a transcript. e, Biplots of DAZL cluster features (arrows) projected on three principal components (PC1–PC3; b). Dots represent transcripts. Colours correspond to functional mRNA classes (THRH, THRM, TLRM, TLRL, TMRH, TMRL; Fig. 4b). Separation of transcripts in 21 groups is marked as 1–21. f, Link between functional mRNA classes and kinetic parameters (ΣB, ΔΣB), cluster features (number of binding sites in cluster, proximity to PAS) and UTR features (numbers of clusters on UTR, UTR length, transcript level). Left, enrichment of terciles (H, M, L; Fig. 4a, Extended Data Fig. 7a–d) for ΣB, ΔΣB, number of binding sites in cluster, cluster distance from PAS, UTR length and transcript level in group 1. Numbers and colour indicate the degree of enrichment. The row on the left marks the visualization of the DAZL code for group 1 that is used in Fig. 4d. Right, enrichment of terciles for the features indicated in the left panel for all groups (1–21). Functional mRNA classes for the respective groups are shown at the bottom. g, Genome browser traces of representative transcripts of select groups. mRNA classes are indicated. The y axis represents normalized coverage value. h, Mapping of transcripts from select groups on two biological networks. Groups are coloured as indicated. Proximity of transcripts of a given group in the network indicates closely related biological functions. Source data
Extended Data Fig. 9 Decision tree classification linking the DAZL code to the functional effects of DAZL binding.
a, Decision tree classifier (CHAID algorithm56,57,58) of seven features (ΣB, ΔΣB, distance to PAS, 3′-UTR length, transcript level, number of clusters in a given 3′-UTR (Clust./3′-UTR), and number of sites in cluster; Extended Data Fig. 8) in terciles (Extended Data Fig. 7). Nodes (◊) mark the given feature and corresponding partition (high, medium, low). Circles indicate the number of transcripts; donut graphs mark the functional mRNA classes, colour-coded as shown on the right. Circled numbers left of the heat map with the DAZL code (identical to that in Fig. 4d) indicate the number of transcripts in a given group. The decision tree was calculated by cross-tabulation of predictor variables (transcripts, n = 413) with target variables (functional mRNA classes THRH, THRM, TLRM, TLRL, TMRH and TMRL; Fig. 4b) followed by partitioning of predictor variables into statistically significant subgroups (Χ2 test, for independence with significance threshold: 0.05 (ref. 59, Supplementary Table 10). b, Confusion matrix corresponding to the decision tree. Validation 1 (n = 24 transcripts) and Validation 2 (n = 21 transcripts) are predictions for transcripts that were not included in the decision tree classification. Source data
Extended Data Fig. 10 Linear regression models for linking the DAZL code to the effects of DAZL on changes in transcript levels, ribosome association and translation efficiency.
a, Distribution of changes in translational efficiency values (ΔTE) between high and low DAZL concentration for transcripts in the 21 groups of the DAZL regulatory program, defined in Fig. 4d. mRNA functional classes are defined in Fig. 4b. The grey area in the plot centre marks unchanged ΔTE (95% confidence interval). P values were calculated by one-way ANOVA of inter-group variations for each mRNA functional class (for box plots: horizontal line, median; box limits, IQR; whiskers 1.5 × IQR). b, Linear regression models tested. (yellow, dummy coding, using terciles of the variables (Extended Data Fig. 8); red, no dummy coding, use of continuous data; grey, variable was omitted. c, Adjusted R2 for each model. d, DILCs for each model. Grey boxes mark models without the respective variable. e, Significance of each DILC for each model (white, significant (P < 0.005 to 0.05); black, not significant (P > 0.05); P values by one-sided Student’s t-test on each coefficient). Only model 1 (M1) shows consistently significant DILCs. Models 24–27 include interaction terms corresponding to 7 independent variable terms and test the effect of multi-collinearity. Interaction terms for each of the models were as follows: M24: ΣB | ΔΣB and ΣB | number of binding sites in a cluster. M25: ΣB | ΔΣB. M26: ΣB | ΔΣB and ΣB: Proximity to PAS. M27: ΣB | Proximity to PAS. Interaction terms are the cross product of encompassing independent variable terms and were selected based on pairwise correlation coefficients (Extended Data Fig. 8a). f, Linear regression model linking the DAZL regulatory program to changes in translational efficiency values (ΔTE) (a). Points represent the DILC (red, DILCs for translational efficiencies that increase at high DAZL concentration; black, DILCs for translational efficiencies that decrease at high DAZL concentration). g, Correlation between experimental values for ΔTE and values predicted with the linear regression model for test dataset. h, Correlation between predicted values for ΔRPF and changes in luciferase activity between high and low DAZL concentration for reporter RNA constructs. Reporters were generated by appending the 3′-UTR of the respective transcripts to a luciferase ORF, and measurements were performed as described previously17. Error bars represent s.e.m. for each data point, corresponding to five independent experiments. Naa40 and Ptma were part of the model building dataset (training dataset). Calm2, Cxcl1, D’Rik and Spp1 were part of the test dataset. Source data
This file contains Supplementary Methods, Supplementary Tables S1– S7, Supplementary Figures S1– S7 and Supplementary Schemes S1 and S2.
Binding site kinetic parameters.
Metrics for Dazl binding site clusters.
Decision Tree parameters.
Ribo-Seq and RNA-seq data.
GO term gene list.
About this article
Cite this article
Sharma, D., Zagore, L.L., Brister, M.M. et al. The kinetic landscape of an RNA-binding protein in cells. Nature 591, 152–156 (2021). https://doi.org/10.1038/s41586-021-03222-x
Nature Methods (2021)
Nature Reviews Methods Primers (2021)