Recent methodological advances allowed the identification of an increasing number of RNA-binding proteins (RBPs) and their RNA-binding sites. Most of those methods rely, however, on capturing proteins associated to polyadenylated RNAs which neglects RBPs bound to non-adenylate RNA classes (tRNA, rRNA, pre-mRNA) as well as the vast majority of species that lack poly-A tails in their mRNAs (including all archea and bacteria). We have developed the Phenol Toluol extraction (PTex) protocol that does not rely on a specific RNA sequence or motif for isolation of cross-linked ribonucleoproteins (RNPs), but rather purifies them based entirely on their physicochemical properties. PTex captures RBPs that bind to RNA as short as 30 nt, RNPs directly from animal tissue and can be used to simplify complex workflows such as PAR-CLIP. Finally, we provide a global RNA-bound proteome of human HEK293 cells and the bacterium Salmonella Typhimurium.
RNA-binding proteins are key factors in the post transcriptional regulation of gene expression. Spurred by recent technological advances such as RNA interactome capture (RIC)1,2,3, the number of RBPs has greatly increased. A powerful tool to study ribonucleoproteins (RNPs) is UV cross-linking: irradiation of cells with short wavelength UV light results in covalent cross-links of proteins in direct contact with the RNA (Fig. 1a)4,5,6. Exploiting the stability of cross-linked RNPs, new methods have been developed to identify and analyse RNPs: (i) RNA interactome capture in which poly-A RNA and its bound proteins are first selected by oligo-dT beads and co-purified proteins subsequently identified by mass spectrometry (MS). This led to the discovery of hundreds of hitherto unknown RBPs7,8. (ii) Cross-linking and immunoprecipitation (CLIP) and similar methods in which, after UV cross-linking, individual RBPs are immunoprecipitated, and co-precipitated transcripts are identified by RNA-Seq, yielding high resolution data on the RNA binding sites of the RBPs of interest9,10,11.
As RNA interactome capture relies on the purification of cross-linked RNPs based on hybridisation of oligo-dT beads to oligo-A sequences typically found in eukaryotic messenger RNAs, RBPs that exclusively associate with non-adenylate RNA species such as e.g. rRNA, tRNAs, snRNAs, histone mRNAs, or numerous lncRNAs cannot be identified.
The same limitations apply to mRNA from bacteria and archaea that lack poly-A tails in general. Recently, RNA interactome using click chemistry (RICK12 and CARIC13) has been introduced in which labelled RNA along with UV cross-linked interacting proteins was purified in a poly-A-independent fashion. However, the method requires efficient in vivo labelling of RNA, limiting its application to suitable (cell culture) systems. Consequentially, no RNA-bound proteomes of prokaryotes have been determined by biochemical means to date.
A commonly used protocol to purify RNA from whole-cell lysates is the single-step method14, also marketed as “Trizol”. First, chaotropic conditions and ionic detergents are employed to denature cellular components, followed by a biphasic extraction using the organic compound phenol. During this treatment, nucleic acids are specifically enriched in the aqueous phase. Furthermore, the pH during extraction allows to control if DNA and RNA (neutral pH) or only RNA (acidic pH) accumulate in the aqueous phase (acidic conditions shown in Fig. 1b, right panel).
Here, we describe a method that builds on the single step principle to separate RNA, proteins and cross-linked RNA–protein complexes in biphasic extractions according to their physicochemical differences. Following the rationale that phenol and toluol (toluene) share a similar chemical structure but toluol is less water soluble due to the lack of the OH group (Fig. 1b), we modified the extraction chemistry using a mixture of phenol:toluol. This alters enrichment of the biomolecule classes in the extraction and furthermore enabled us to shift cross-linked RNPs into the aqueous or interphase, respectively (Fig. 1b, left panel). Combining both separation strategies allowed us to sequentially deplete the sample of DNA and lipids, as well as non-cross-linked RNA and proteins, highly enriching for cross-linked RNPs (clRNPs) that then can be directly analysed or further processed in more complex workflows.
PTex enriches for cross-linked RNPs
The poly-A RNA interactome of human HEK293 cells has been mapped in great depth, providing an ideal reference to establish PTex (Phenol Toluol extraction)-based purification of cross-linked RNPs2. After irradiation with UV light at 254 nm to induce covalent RNA-protein cross-links, cells were subjected to the PTex procedure, a series of three consecutive organic extractions:
Step 1: Phenol:Toluol (PT; 50:50), pH 7.0
Step 2: Phenol, pH 4.8, (chaotropic) detergents
Step 3: Phenol, EtOH, water, pH 4.8
During the first extraction with Phenol/Toluol, RNA, proteins and cross-linked RNPs (clRNPs) are accumulating in the upper aqueous phase while DNA and membranes are predominantly found in the interphase (Fig. 1b left panel). The aqueous phase is subsequently extracted twice under chaotropic and acidic conditions using phenol14. Now, free RNA accumulates in the upper aqueous phase, free proteins in the lower organic phase and clRNPs migrate to the interphase4 (Fig. 1b right panel). Finally, the complexes in the interphase are precipitated using ethanol15. To track the distribution of the diverse cellular molecules from total HEK293 cells during the purification procedure, we probed all phases from the intermediary steps by western blotting against HuR (ELAVL1), a well established 35 kDa RBP16 (Fig. 1c). UV-cross-linking produces an additional band at high molecular weight which indicates the RNA-cross-linked fraction of HuR (clHuR; Supplementary Fig. 1). In the PTex fraction (interphase 3), clHuR was highly enriched whereas free HuR was strongly reduced. Furthermore, abundant cellular proteins unrelated to RNA-binding such as beta-actin are not detectable in the PTex fraction. To further demonstrate the efficient removal of non-cross-linked proteins, we spiked into the cell lysates a recombinant RNA-binding protein (the central domain of Drosophila melanogaster Sex-lethal, denoted Sxl-RBD417), after UV cross-linking. PTex efficiently removes ∼99% of the free spike-in RBP (as determined by densitometry compared to the input; Fig. 1c). Similarly, removal of free RNA was demonstrated in an in vitro assay in which 32P-5′ labelled RNA was subjected to PTex (Fig. 1d). We next tested for depletion of DNA by PCR targeting genomic DNA (exon 5 of the IL3 gene) or plasmid DNA (pUC19). DNA is removed during the first two PTex steps (Fig. 1e). We then tested for additional well-established RBPs (Fig. 1f), namely polypyrimidine tract binding protein 1 (PTBP1), fused in sarcoma (FUS), and the more recently identified RNA-binding enzymes glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and enolase (Eno1);3 all of which are enriched by PTex in a UV-irradiation-dependent manner whereas the highly abundant DNA-binding histone H3 is depleted. We additionally used PTBP1 and FUS to demonstrate that RNase digestion prior to PTex abolishes efficient enrichment of RBPs (Fig. 1g, Supplementary Fig. 9), consistent with selectivity for RBPs complexed with RNA. In sum, PTex highly enriches for cross-linked RNPs while efficiently depleting non-cross-linked proteins or nucleic acids.
We then set out to critically assess the performance of PTex to purify RBPs. However, while cross-linked HuR generates a signal at the height of the gel pocket (compare Fig. 1c and Supplementary Fig. 1), we have no a priori knowledge about the amount of protein which is efficiently cross-linked by UV light to RNA. Thus, we performed RNA interactome capture1 on HEK293 cells to obtain proteins which are 100% cross-linked to poly(A) RNA. We used this sample as input for PTex (Fig. 2a) and quantified purification of HuR by densitometry (Fig. 2b), resulting in a ∼30% recovery of the cross-linked protein input. To analyse the minimal length of RNA required for PTex-mediated enrichment of RNPs, we again employed the recombinant and highly purified 20 kDa Sxl-RBD4 protein which associates with Uracil stretches of 7 nucleotides or longer18. We produced in vitro transcribed RNAs with lengths varying between 13 and 191 nucleotides, all of which contained the same Sxl-binding motif at their 5′-end. After binding and UV cross-linking in vitro, samples were PTex-purified and analysed by western blotting (Fig. 2c). PTex efficiently recovered cross-linked Sxl-RBD4 complexes with RNA as short as 30 nt. Similar to HuR, we calculated the recovery of Sxl by densitometry (Fig. 2d) which is ∼50% of the input cross-linked protein. Since purification efficiency differs for individual proteins (see Figs. 1c–f and 2a–d), we quantified protein and RNA recovery by PTex using spectroscopy and measured absorption at 260 nm and 280 nm as proxy for RNA and protein, respectively (Fig. 2e). Consistent with the other results, overall PTex recovery is 27% (RNA) and 33% (proteins) from cross-linked HEK293 interactome capture samples (Supplementary Data 4).
Finally, we estimated the relative enrichment of free vs. cross-linked protein in input and PTex (Fig. 2f, g). For HuR (in vivo), PTex largely enriches for RNA-bound HuR with a background of 10–15% non-UV-cross-linked HuR. For Sxl, PTex-mediated enrichment is even larger if compared to the input; however, free Sxl is more prominently detected in the artificial in vitro conditions only but not when spiked-in to cell lysates (Fig. 1c) where the free protein is removed almost completely. Interestingly, we only recover non-cross-linked proteins in the PTex fraction when interrogating RNA-binding proteins but not in the case of other abundant cellular proteins such as histone H3 or beta-actin (see Fig. 1c–g). We attribute this to either stable RNA-protein complexes which resist complete denaturation and separation during the PTex procedure or an artefact of the gel system as RNA strand breaks after PTex purification could result in a western blot signal akin to the free/unbound RBP. In both cases, only bona-fide RBPs will be enriched.
Purification of RBPs from animal tissue
To test if PTex can be directly applied to tissues, we UV cross-linked whole mouse brain samples and performed PTex to extract clRNPs directly (Fig. 3). Since brain tissue is particularly rich in lipids which accumulate in the interphase during step 1 of the PTex protocol, we increased the temperature during extractions to 65 °C (Hot-PTex). The extracted clRNPs were analysed by western blotting. Cross-linked HuR is largely enriched over non cross-linked HuR after PTex (Fig. 3 lower panel). Weak detectable bands at 70 or ∼110 kDa, respectively, are likely HuR dimers and trimers as observed before19. This example demonstrates that PTex is not only suited for cell culture but can also extract RNPs directly from animal tissue samples; an advantage over purification protocols that depend on RNA labelling (such as PAR-CLIP11, RICK12, or CARIC13) for which efficient uptake and incorporation of nucleotide analogues can be challenging. Brain tissue can be efficiently UV cross-linked, however, as demonstrated by HITS-CLIP experiments20.
A simplified CLIP protocol
HuR (ELAVL1) has been shown to interact with mRNA and pre-mRNA in several CLIP studies and has a well-documented binding motif (5′- UUUUUU −3′)16. After in vivo labelling of cellular RNA using 4-thiouridine (4SU) and UV irradiation at 365 nm, we performed (i) classical PAR-CLIP analysis (PAR-CLIP-classic)11 of HuR, (ii) a PAR-CLIP variant using on-bead ligation of adaptors (PAR-CLIP-on-beads)21,22, and (iii) a version in which we use phenol extraction (termed pCLIP) for removal of unbound RNA instead of PAGE/membrane excision (Fig. 4a). We found that pCLIP libraries contained a larger fraction of longer reads than the PAR-CLIP classic/PAR-CLIP-on-beads libraries (Fig. 4b). All three approaches identify the canonical 5′- UUUUUU −3′ motif and similar profiles of HuR-bound RNA clusters map to intronic and 3′UTR regions in all three variants (Fig. 4c–e). The clusters could also be mapped to the same 3′UTR loci when comparing HuR binding sites in tubulin and splicing factor Srsf6 mRNA (Fig. 4f, g). Although we only performed a low-read-coverage experiment as proof-of-principle, our results demonstrate that phenolic extractions of RNPs such as PTex can be integrated into more complex workflows such as (PAR-)CLIP and have the potential to simplify CLIP-type approaches by enriching for clRNPs or remove unbound RNA transcripts.
A global snapshot of human RNA–protein complexes
Despite the recent advances in mapping RBPs in many species, two general issues have not been addressed to date: (i) the fact that RNA interactome capture1,2,3 targets only polyadenylated RNAs suggests that many RBPs that bind non-adenylate RNAs are missed by this experimental approach;12,13 and (ii) although UV cross-linking has been widely used to research RNA–protein interactions, no systematic study has been performed to determine the optimal irradiation conditions for efficient cross-linking of individual RNPs. Using PTex as an unbiased approach, we set out to explore RNA-protein interactions cell-wide in HEK293 cells. To test the effect of different energy UV-irradiation, we employed besides the most commonly used 0.15 J/cm2 irradiation at 254 nm wavelength also irradiation at two additional energy levels: 0.015 and 1.5 J/cm2; spanning two order of magnitude of UV irradiation dosage. This setup was then used to comprehensively map RNA-protein interactions beyond the established poly-A RNA-bound proteomes2,8. We independently irradiated HEK293 cells at all 3 energies and performed PTex purification of cross-linked and non-cross-linked RNPs from whole cells (Fig. 5a). Transcriptomics by RNA-Seq (Fig. 5b, c), and proteomics by MS and label-free quantification (LFQ) (Fig. 5d–g) were performed using total RNA and protein preparations as input controls.
RNA in PTex-purified RNPs
We first analysed PTex-purified transcripts. Unlike proteins which can be grouped into RNA-interactors and non-interactors, all cellular RNA can be expected to be associated to proteins23,24. In line with this, we find a similar distribution of RNA classes when comparing RNA from inputs and PTex by RNA-Seq with the vast majority of transcripts being ribosomal RNA (Supplementary Fig. 21). Protein-cross-linked RNA is known to enrich for mutations during reverse transcription in the RNA-Seq workflow (Fig. 5b, Supplementary Fig. 17)10,11. Such mutations can then be used as beacons to map protein binding sites in transcripts2,11. We used pyCRAC25 to map deletion and substitution mutations enriched in UV-treated PTex samples to 100 nt windows around the 5′ AUG start codon and 3′ stop codon of mRNA reads (Fig. 5c, Supplementary Fig. 18). We find that most mutations (indicating protein binding) are within the first 100 nt after the AUG and the last 100 nt before the Stop codon. This was observed before in a global protein occupancy profiling study26 and can potentially be attributed to cross-linking of ribosomal proteins or translation initiation/termination factors to mRNA, as ribosomal profiling experiments show increased ribosomal footprint densities at these regions, indicating longer dwell times and a higher potential for cross-linking at these sites27.
Proteins in PTex-purified RNPs
Proteins which were not identified by MS in all 3 replicates after PTex were removed. For the remaining proteins, ratios of cross-linked over non cross-linked (CL/-CL) LFQ intensities (from the PTex experiments) were calculated (Fig. 5d). P-values from a moderated t-test were then used for multiple testing (Benjamini-Hochberg). Using these stringent tests, we identify 3188 shared among the three conditions; out of these, 3037 proteins are significantly enriched in a UV-irradiation-dependent fashion in all samples using a cut-off of FDR 0.01 (Fig. 5e, Supplementary Data 1). Analysis of general protein features (molecular mass, pI, cellular abundance, hydrophobicity, Fig. 5f, g) demonstrates that the PTex procedure does not enrich for a particular subgroup of cellular proteins based on chemical properties or expression level. We picked two of the 3037 proteins which have not been reported to bind RNA: ATP-binding cassette sub-family F member 2 (ABCF2), a member of the AAA + ATPase family (see below) and T-complex protein 1 subunit eta (CCCT7) as part of the chaperonin CCT/TRiC which is involved in telomere maintenance28. Both are enriched after UV-irradiation in vivo and PTex purification (Fig. 5h, i), indicating RNA-association.
In principle, extended UV exposure should increase the chance for cross-linking events and thus for subsequent protein recovery by PTex3. We therefore expected a gradual increase in enrichment of RNA-interacting proteins after PTex with higher UV dose. However, we only find increased recovery for most proteins when comparing low and medium energies. Surprisingly, irradiation with 1.5 J/cm2 on the other hand leads to a significant decrease in recovery (Supplementary Fig. 23). We observed RNA degradation when analysing total RNA from HEK293 cells irradiated with 1.5 J/cm2 254 nm UV light (Supplementary Figs. 20, 21;3). If this degradation was due to damage of nucleic acids induced by high energy UV light or a result of secondary processes during the extended time of treatment is unclear; in any case extensive shortening of RNAs will cause a loss of RNPs purified by PTex.
HEK293 RNA-interacting proteins
So far, 700–2000 well-established and recently identified eukaryotic RBPs have been described (reviewed in refs. 29 and8) and PTex-purified protein patterns differ from the whole proteome in silvers stains (Supplementary Fig. 22). However, to find more than 3000 proteins to be enriched as RNA-associated by PTex is unexpected. Considering that deep proteome studies detect around 10,500 proteins in HEK293 cells30, we find nearly a third of the expressed cellular proteome to be associated with RNA.
To test sensitivity and specificity of our approach, we first performed global GO enrichment analysis showing that terms from all aspects of RNA biology are the most enriched among PTex-purified proteins (Fig. 6a). At the same time, protein classes with no general role in RNA biology such as transporters and (trans-)membrane proteins were depleted by PTex (Supplementary Data 2). Known RNA-binding domains (RBDs) such as the RNA recognition motif (RRM), helicase folds (DEXDc, HELICc) or K homology (KH) domain were significantly enriched among PTex-purified proteins (Fig. 6b; Supplementary Data 2). 89 PTex-enriched proteins contain a WD40 fold; a domain found to directly bind snRNA in Gemin 531, taking part in rRNA biogenesis (Erb132) and found in RBPs1. Other enriched domains are: AAA (ATPase, see below) fold, tetratrico peptide repeat region (TPR) as found in the yeast Clf1p splicing factor33, Ski complex34 and the translation terminator Nro135, and the CH domain which is found among actin-binding proteins3.
To test that we are enriching for RNA-binders specifically, we calculated the probability of recovering known RBPs interacting with different RNA classes using the hypergeometric test: we enrich for ribosomal proteins (42/47 large subunit and 30/33 small subunit;36 p-value: 1.36 × 10–30), NSUN2 and tRNA synthetases (19/20 cytosolic; p-value: 7.4 × 10–10) indicating that our approach indeed captured cellular RNPs in a poly-A-independent fashion. We recover 70% of poly-A RNA-binding proteins found in HEK293 cells by ref. 2 (Fig. 6c) although different UV irradiation strategies were used (254 vs. 365 nm cross-linking; p-value: 1.26 × 10–139). Importantly, the largest overlap was with RBPs recently found in HeLa cells using the also unbiased RICK technique12 (94% for high confidence RBPs and 86% of non-poly-A RBPs, respectively) and the CARIC approach (84% overlap)13 (Fig. 6c). Recent studies show that the boundaries between RNA- and DNA-binding are rather blurry and nuclear DNA-binders were found to interact with RNA37,38,39,40. We identified proteins involved in replication and response to DNA damage (Supplementary Data 2) such as DDX5438 but in general, DNA-binders such as transcription factors were underrepresented, demonstrating that PTex does not select for DNA-specific binding proteins in particular (Fig. 6c). To rule out that previously described RBPs are more efficiently recovered in PTex than the newly identified RNA-associated proteins (which could indicate carry-over of proteins unrelated to RNA interactions), we compared the distribution of the HEK293 mRNA-binding proteins2 in the PTex enrichment. The established RBPs are similarly enriched along the dynamic range (from no enrichment to log2FC PTex [CL/-CL] of 6; Fig. 6d) of PTex and hence display no difference to the novel RNA-interactors.
The presented results demonstrate that PTex is specific for RNPs. But why were not the same proteins discovered to associate with RNA before? The majority of recently discovered RBPs are interacting with mRNA;8 a RNA class which is highly heterogeneous in its sequence but represents only ∼5% of the cellular RNA pool. The differences in between interactome capture (poly-A RNPs) and PTex (RNPs in general) is best demonstrated in the case of the eukaryotic RNA exosome41. The core exosome complex consists of 10 protein subunits (Exo-10) from which only one protein (Rrp44) is catalytically acting on RNA as exo- and endoribonuclease (Fig. 6f). The remaining nine proteins (Exo-9) are forming a barrel-like complex in which RNA can be channelled through before it is degraded by Rrp44, but Exo-9 proteins do not degrade or modify the RNA itself41,42. Still, all 10 subunits are positioned to directly interact with RNA and multiple interactions with the individual subunits have been demonstrated in high-resolution structure studies43,44. Hence, all 10 subunits are amenable to UV cross-linking and, as a result, 9 out of the 10 subunits were identified by PTex (Fig. 6g).
Previously it was demonstrated that interactome capture enriches for mRNA-binding proteins with a high isoelectric point (pI; Fig. 6e)1,3. However, sequence- and oligo-dT-independent approaches such as RICK12 or PTex identify more proteins with a pI<6. Proteins with a low pI are overall negatively charged at cellular pH and thus unlikely to interact with RNA in an unspecific manner due to electrostatic repulsion of likewise negatively charged RNA. Indeed, 7 of the Exo-10 protein subunits have the mRBP-untypical isoelectric point below pH 6 (Supplementary Fig. 24). Inside the central exosome channel, RNA of 30–33 nt or 9–10 nt length has been found in in vitro and in CRAC analyses;45 the former being of sufficient length for efficient recovery by PTex (Fig. 2c). In sum, PTex enriches for the (near) complete exosome core complex while interactome capture from the same cell line only found a single subunit2 (Fig. 6g).
Yet we also enrich for proteins which have no established role in RNA biology such as subunits of the human proteasome (Supplementary Data 2). However, from the 28 proteins of the 20S core complex, Psma5 and Psma6 were reported to display RNase activity in purified complexes46. Importantly, proteasome-associated RNAs were shown to lack poly-A stretches47 and ATPases of the AAA family in the 19S proteasome regulatory particle were found to be recruited to RNA polymerase I (rRNA) transcription sites48. Hence, none of the proteasome-related RNA activities are approachable via poly-A RNA-mediated purification. The proteasome is a multi-protein complex with structural similarities to the exosome41 and RNAs interacting with Psma5/6 are likely to be UV-cross-linked to other subunits as well.
A snapshot of RNA-associated proteins in bacteria
Prokaryotic mRNAs lack poly-A tails and are thus not approachable by oligo-dT-based methods. We used the pathogen Salmonella Typhimurium harbouring a chromosomally FLAG-tagged Hfq protein49 to test PTex in bacteria. Hfq is an abundant RNA-binder facilitating mRNA:ncRNA interactions in Gram-negatives49. As for animal tissue culture (see above), we used the slightly modified PTex protocol (Hot-PTex) in which RNP extraction from Salmonella grown to OD600 3.0 was performed at 65 °C thereby supporting cell lysis (Fig. 6a). Hfq is a 17-kDa protein and forms a homo-hexamer in bacterial cells; the complex has been shown to resist normal Laemmli PAGE conditions49 and is also visible in western blots (Fig. 6b). After UV irradiation, PTex purification and RNase treatment, we observe a shifted Hfq monomer band which we attribute to residual cross-linked RNA fragments resulting in a slightly higher molecular mass. The physiologically relevant Hfq hexamer is also strongly enriched compared to non-UV samples, indicating that also the complex is still bound to remaining RNA fragments.
We next used the Hot-PTex fraction of Salmonella cells to map RNA-associated proteins by MS (Fig. 6c). Comparing recovered protein intensities from UV-irradiated versus control cells (biological duplicates), we find 172 proteins (Supplementary Data 3), among them 33 ribosomal proteins, components of the RNA polymerase complex (subunit α, σ factor RpoD, DksA) and 4 out of the 5 established mRNA-binding proteins of Salmonella (Hfq, ProQ and CspC/CspE)49,50,51. One hundred and thirteen of the enriched proteins are so far unknown to interact with RNA. To validate our findings, we picked YihI, a putative GTPase-activating protein which was speculated to play a role in ribosome biogenesis52, the alkyl hydroperoxide reductase c22 protein (AhpC)53 and the cell invasion protein SipA54. Using mutant strains carrying FLAG-tags fused to the C-terminus of PTex candidate RNA-interactors, we performed UV cross-linking, immunoprecipitation and radioactive labelling of co-purified RNA (known as PNK assay as described in ref. 55), validating that YihI, SipA and AhpC are indeed associated to RNA in vivo (Fig. 6d, Supplementary Figs. 25, 26). In addition, we recently validated PTex-enriched ClpX and DnaJ as well56. We furthermore find proteins with known RNA-binding domains (RBDs) such as the nucleic acid binding OB-fold (present in RpsA, RpsL, RplB, CspC, CspE, Pnp, RNaseE, Ssb and NusA) and domains which were also detected in RBPs when screening eukaryotic cells: the afforementioned AAA ATPase fold in the ATP-dependent protease ATPase subunits HslU, ClpB, and ClpX, or thioredoxin domains as in AhpC, Thiol:disulphide interchange protein (DsbA) and Bacterioferritin comigratory protein (Bcp)1,3. As in many other species, we find glycolytic enzymes to associate to RNA (Pgk, Pgi)3,57. Using GO terms for functional annotation of the RNA-associated proteins, the most significant terms are “translation” and other terms connected to the ribosome as expected (Fig. 6e). To the best of our knowledge, the periplasmic space is generally considered to be devoid of RNA. We still recover RNA-associated proteins which localise to the outer membrane (Supplementary Data 3). As in the case of HEK293, we cannot distinguish here between RBPs that actively act on RNA and proteins which are associated to RNA for e.g. structural reasons. However, recent studies in several Gram-negative bacteria demonstrate that secreted outer membrane vesicles (OMVs) contain RNA which indicates that bacterial transcripts must be sorted to the outer membrane via a yet to be determined pathway (reviewed in ref. 58). It is tempting to speculate if the here identified proteins are involved in such a process or are mere bystander proteins.
Overall, we noticed that enrichments even from known RBPs were lower in Salmonella compared to HEK293 cells and we anticipate that additional modifications in UV cross-linking and/or cell lysis could improve sensitivity when applying (Hot-)PTex in bacteria. PTex is the first approach that can purify bacterial RNPs in an unbiased fashion without the necessity of immunoprecipitation or introduction of modifications (tag, overexpression, etc.), rendering it a tool for cell-wide RBP identification and studying bacterial RNA–protein interactions.
UV cross-linking of RNPs appears rather inefficient. Even after high UV doses only ∼1–10% of any given RBP can be covalently coupled to ribonucleic acids in human cell culture1,4 and yeast3,57,59. Importantly: this reflects both cross-linking efficiency and fraction of protein bound to RNA; in other words, how much of the protein is in a steady state associated with the RNA? What is of experimental interest is therefore only a minor fraction of the RBP (the one cross-linked to RNA), while the vast excess of protein stays in a non-cross-linked state.
With PTex, we are exploiting the physicochemical differences between cross-linked hybrid RNA–protein molecules on the one hand and the non-cross-linked proteins and RNA of a cell on the other for selective purification of complexes only. The method was designed to select against secondary RNA-binders by using denaturing and chaotropic conditions in which RNA–protein interactions which were not covalently cross-linked are not preserved14, and by selecting proteins which were enriched in a UV-dependent fashion. PTex is a fast and simple modular protocol which can be performed in about 3 h. Our approach is independent of the UV wavelength applied for irradiation (254/365 nm) and the type of biomaterial used (human/bacterial cell culture, animal tissue), and does not rely on presence of a particular RNA sequence such as poly-A tails. Cross-linked proteins of 30–50% could be recovered by PTex compared to the starting material (Fig. 2b–e). At the same time, PTex drastically improves the relative enrichment of cross-linked over free protein (Fig. 2f, g).
Next to PTex, RBR-ID60, RICK12 and CARIC13 are methods which allow for unbiased purification of RNPs. However, there are also differences: All, RBR-ID, RICK and CARIC rely on RNA labelling using either 4SU or 5-ethynyl uridine (EU) and subsequent UV irradiation. Using pulse/chase experiments, these methods thus allow to determine newly transcribed RNA species. For PTex, UV irradiation at 254 nm wavelength is sufficient (no labelling required) and we anticipate that this is an advantage when analysing biological material in which uptake of nucleotide analogues is either insufficient or too cost-intensive. It did not escape our attention however that PTex requires fewer input material compared to the other methods (Table 1). Since all of these methods rely on UV irradiation, however, biases introduced by different cross-linking efficiencies for individual proteins remain a general issue. During the revision of this manuscript, two related protocols were published termed OOPS61 and XRNAX62. Both also exploit organic biphasic extraction as principle to purify RNPs.
Our work provides a cell-wide analysis of the effects of different UV irradiation dosage on RNA-protein cross-linking (Supplementary Data 1, Supplementary Fig. 23). We hope that this resource will aid researches to establish suitable conditions for cross-linking of individual RNPs. The decrease in recovery of proteins after using 1.5 J/cm2 254 nm light (Supplementary Fig. 23) demonstrates that extensive irradiation/cross-linking can have adverse effects on protein recovery. Next to the observed RNA degradation, cross-linked peptides released by tryptic digestion are notoriously difficult to identify in MS experiments and hence increasing the amount of protein cross-linking might negatively impact protein identification63. This is not the case for PTex-purified and input control transcripts in which deletion mutations accumulate at high UV settings which can then serve as marker for protein interaction sites (Fig. 5b, c)9,10,11. Our results demonstrate that when investigating RNPs on a global scale, the cross-linking strategy should be adapted to the biological question: not all proteins are interacting with RNA and increasing the UV dose can be disadvantageous for RBP recovery since severely degraded RNA will cause less-efficient purification by PTex and cross-links will impair identification by MS. Conversely, almost all RNA can be expected to be bound by a set of proteins under physiological conditions23,24 which explains why we also observe an increase in mutations in input RNA (Fig. 4b, Supplementary Figs. 17, 18). In contrast to protein recovery, (partial) in vivo RNA degradation will not impair recovery of cross-linked transcripts since RNase treatment/RNA fragmentation is part of CLIP and RNA-Seq workflows already and cross-linked RNA will not be lost during cDNA preparation or sequencing9,10,11.
Our findings indicate that up to a third of a cell’s proteins can associate with RNA in vivo which raises the question of the underlying biological function of these interactions. In this respect, it is intriguing to see that the Exo-9 proteins are interacting with and can be cross-linked to RNA although none of these subunits display RNase activity themselves. By increasing the detection efficiency for UV-cross-linked complexes e.g. by recovery of proteins interacting with RNA as short as 30 nt, we now have to separate classical RBP functionalities such as RNA degradation, transport or modification from RNA-interactors which are in physical contact with RNA due to structural organisation as in the case for ribosomal or exosome proteins. With PTex, we have developed a tool for fast recovery of RNPs in general which will allow us interrogate the functionality of individual proteins in RNA biology. So far, eukaryotic proteomes have been extensively scrutinised for RNA-binding proteins in the recent years. The two other kingdoms of life - archea and prokaryotes - could not be investigated for technical reasons. We here provide a RNA-bound proteome from Salmonella Typhimurium, demonstrating that PTex will now allow to expand global RNP analysis to species in all three branches of the tree of life.
Human cell culture and in vivo cross-linking
Human embryonic kidney cells (HEK293, a gift from Markus Landthaler, Max Delbrück Center for Molecular Medicine, Berlin, Germany) were grown on 148 cm2 dishes using DMEM high glucose (Dulbecco’s Modified Eagle, glucose 4.5 g/L, Gibco, 41966–029) supplemented with 10% bovine serum (Gibco, 10270–106), penicillin/streptomycin (100 U/mL 0.1 mg/mL; Gibco, 15140–122) at 37 °C with 5% CO2. After reaching 80% confluence, cells in monolayer were washed once with cold phosphate buffer saline (DPBS; Gibco, 10010–015) and placed on ice. Then, DPBS was removed completely and cells were irradiated with 0.015–1.5 J/cm2 UV light (λ = 254 nm) in a CL-1000 ultraviolet cross-linker device (Ultra-Violet Products Ltd), collected in 15 mL tubes, pelleted by centrifugation (1000×g, 3 min, 4 °C), aliquoted in 2 mL tubes and stored at −20/−80 °C (CL). Non-irradiated cells were used as non-cross-link control (-CL). In addition, the potential UV damage on the RNA after exposure to the different radiation energies was assessed: RNA isolated from HEK293 cells before and after exposure UV light by phenol extraction14 were analysed with the RNA 6000 Chip Kit in a Bioanalyzer 2100 (Agilent, 5067–1513).
Bacterial cell culture and in vivo cross-linking
Salmonella Typhimurium SL 1344 Hfq::x3FLAG (a gift from Jörg Vogel,49) was grown on LB medium to stationary phase (OD600 = 3). Aliquots of 20 ml were pelleted (20,000×g, 8 min, 37 °C) and resuspended in 1/10 of water for UV irradiation. Cells were cross-linked on ice with 5 J/cm2 UV light (λ = 254 nm) in a CL-1000 ultraviolet cross-linker device (Ultra-Violet Products Ltd), snap-frozen and stored at −80 °C. Bacterial suspensions equivalent to 2.5 ml of initial culture were used as input in Fig. 7b. Salmonella enterica subsp. enterica Serovar Typhimurium strain SL1344 used for the mapping of RNA-protein interactions was grown in LB medium to OD600 2.0. Half of the cultures were cross-linked in a Vari-X-linker (UVO3, www.vari-x-link.com), using UV light (λ = 254 nm) lamps for 90 s. Fractions of 10 ml from each, cross-linked and non-cross-linked cultures, were harvested by filtration64.
Construction of bacterial strains
Yihi::x3FLAG::KmR was constructed following the procedure based on the Lambda Red system developed by ref. 65. The system is based on two plasmids: pKD46, a temperature-sensitive plasmid that carries gamma, beta and exo genes (the bacteriophage λ red genes) under the control of an Arabinose-inducible promoter, and pSUB11, carrying the x3FLAG::KmR cassette. The cassette in pSUB11 was PCR-amplified with primers (forward: 5′-GAA GCA GGA AGA TAT GAT GCG CCT GCT AAG AGG CGG CAA CGA CTA CAA AGA CCA TGA CG-3′ and reverse: 5′-GGG TTA TAA GCA GGA CGG GCA AGC CCA CGG TGT AAA CCC GCA TAT GAA TAT CCT CCT TAG-3′), the 5′ ends of which were designed to target the 3′ end of the gene of interest, digested with DpnI at 37 °C for 1 h and, upon purification, used for subsequent electroporation. Similarly, AhpC::6xHis-TEV-3xFLAG::TetR and SipA::6xHis-TEV-3xFLAG::TetR constructs were produced by amplification of the plasmid pJet1.2-Hfq-HTF-TetR66 with the primers AhpC-forward (5′-AAA GAA GGC GAA GCG ACT CTG GCT CCA TCC TTA GAC CTG GTC GGT AAA ATC CGC TCT GC TGG ATC CAT GGA G-3′) and AhpC-reverse (5′-GTG AGC AGG CGA CGC CAA CGC AGC TAT GGC GTG AAA GAC GAC GGA AAT TTA CGC GTG AGG GGA TCT TGA AG-3′) or SipA-forward (5′-CCT GGC GTG GAT CGG GTT ATT ACT ACC GTT GAT GGC TTG CAC ATG CAG CGT CGC TCT GCT GGA TCC ATG GAG-3′) and SipA-reverse (5′-TTT GAC TCT TGC TTC AAT ATC CAT ATT CAT CGC ATC TTT CCC GGT TAA TTA CGC GTG AGG GGA TCT TGA AG-3′). Prior to electroporation, PCR products were digested with Exo1 and DpnI during 1 h at 37 °C followed by ethanol precipitation and verification on Agarose gels.
Salmonella Typhimurium SL1344 harbouring the plasmid pKD46 was grown in LB containing Ampicillin (100 μg/ml) and l-Arabinose (100 mM) at 28–30 °C to an OD600 of 0.8. Cells were incubated on ice for 15 min, centrifuged at 3220×g for 5 min at 4 °C and resuspended in ice-cold water. The wash was repeated three times. On the final wash, cells were resuspended in 300 μl water and electroporated with 200 ng of PCR product. Cells were recovered for 1 h in LB at 37 °C on a tabletop thermomixer at 600 rpm, plated on LB agar with Kanamycin (50 μg/ml) or Tetracycline (100 μg/ml) overnight. The following day, 10 colonies per strain were picked, resuspended in PBS and streaked on plates containing Ampicillin or Kanamycin (or Tetracycline) and incubated at 37–40 °C. Colonies that showed resistance to Kanamycin (or Tetracycline) but not to Ampicillin were selected for further analysis, and the correct expression of the epitope tag was verified by western blot.
HEK293 suspensions in 600 μl of DPBS (5–8 × 106 cells, ±CL) were mixed with 200 μl of each: neutral phenol (Roti-Phenol, Roth 0038.3), toluol (Th.Geyer, 752.1000) and 1,3-bromochloropropane (BCP) (Merck, 8.01627.0250) for 1 min (21 °C, 2.000 rpm, Eppendorf ThermoMixer) and centrifuged 20,000×g 3 min, 4 °C. The upper aqueous phase (Aq1) was carefully removed and transferred to a new 2 ml tube containing 300 μl of solution D (5.85 M guanidine isothiocyanate (Roth, 0017.3); 31.1 mM sodium citrate (Roth, 3580.3); 25.6 mM N-lauryosyl-sarcosine (PanReac AppliChem, A7402.0100); 1% 2-mercaptoethanol (Sigma)). Then, 600 μl neutral phenol and 200 μl BCP were added, mixed and centrifuged as before. After phase separation, the upper 3/4 of Aq2 and the lower 3/4 of Org2 were removed. The resulting interphase (Int2) was kept in the same tube and mixed with 400 μl water, 200 μl ethanol p.a., 400 μl neutral phenol and 200 μl BCP (1 min, 21 °C, 2000 rpm, Eppendorf ThermoMixer) and centrifuged as previously. Aq3 and Org3 were carefully removed, while Int3 was precipitated with 9 volumes of ethanol (−20 °C, 30 min to overnight). Samples were centrifuged during 30 min at 20,000×g, pellets dried under the hood for max. 10 min and solubilised with 30 μL Laemmli buffer at 95 °C for 5 min; or the indicated buffer/temperature according to the downstream application. For global mapping of RNA-protein interactions in HEK293 cells, 3 replicates from each UV irradiation energy were used. A detailed step-by-step protocol for PTex is available in the Supplementary Methods.
Hot-PTex from mouse tissue
Mice brain samples were a kind gift from M. Preußner (Heyd lab, FU Berlin). All animal experiments with C57BL/6 mice were performed in accordance with institutional and governmental recommendations and laws (Approval no. T0311/13). Mouse-brain tissue (260 mg) was disrupted by cryogenic grinding. Immediately after, 130 mg were irradiated with 0.75 J/cm2 UV light (λ = 254 nm) in a CL-1000 ultraviolet cross-linker device (Ultra-Violet Products Ltd). Samples (±CL) were resuspended in DPBS on ice. 600 μl aliquots (32.5 mg) were mixed during 5 min at 65 °C in the presence of 0.5 g of low-binding zirconium beads (100 μ, OPS Diagnostics, LLC) and the phenol-toluol-BCP mix described above. After centrifuging (20,000×g, 3 min, 4 °C) Aq1 was carefully removed. Consecutive extractions were done as described above with the solely difference that mixing steps were performed at 65 °C (2000 r.p.m, Eppendorf ThermoMixer).
Hot-PTex from Salmonella
For global mapping of RNA–protein interactions in Salmonella cells from two biological replicates were used. Bacteria attached to filters obtained as described in the section Bacterial cell culture and in vivo cross-linking were collected with 12 ml of DPBS and aliquots of 4 ml were pelleted at 20,000×g, 2 min, 4 °C. A modification of the step 1 was introduced in HOT-PTex in order to improve removal of free proteins, as follows: bacterial pellets (±CL) resuspended in 400 μl of DPBS supplemented with EDTA (5 mM) were mixed with phenol-toluol-BCP (200 μl each), and 0.5 g of zirconium beads during 5 min at 65 °C (2000 rpm, Eppendorf ThermoMixer). After centrifugation at 20,000×g, 3 min, 4 °C, the upper aqueous phase (Aq1) was mixed again with the same volumes of phenol-toluol-BCP (without beads) during 1 min at 65 °C. Then the aqueous phase was carefully transferred to a third tube where steps 2 and 3 of the PTex protocol were performed at 65 °C.
Analysis of individual PTex steps
PTex extractions were carried as described before, using sets of three tubes containing synthetic 30–50 nt RNAs 5′́-labelled 32P-ATP, 200 ng pUC19 LacZ-containing fragment (817 bp, generated by DrdI, NEB) or 2–3 × 106 HEK293 cells spiked-in with 0.25 μg of Sxl-RBD4 per tube. PCRs were performed using primers designed to amplify endogenous chromosomal DNA, I l-3 gene (574 bp, forward primer: 5′-GAT CGG ATC CTA ATA CGA CTC ACT ATA GGC GAC ATC CAA TCC ATA TCA AGG A-3′ and reverse primer: 5′-GAT CAA GCT TGT TCA GAG TCT AGT TTA TTC TCA CAC-3′), or the LacZ gene present in the pUC19 linear fragment (324 nt, forward 5′- AGA GCA GAT TGT ACT GAG-3′ and M13-reverse 5′́-CAG GAA ACA GCT ATG ACC-3′). DNA and RNA samples were electrophoresed in Agarose 1.0–1.5% or TBE-Urea PAGE 12%, respectively. Radioactivity was detected by phosphoimaging, while Sxl-RBD4, endogenous HuR or ACTB in HEK293/Sxl-RBD4 spiked-in proteins were analysed by western blot with specific antibodies as described in the western Blotting section.
RNaseA digestion and electrophoretic mobility assay
Cross-linked and non-cross-linked HEK293 cell suspensions were subjected to PTex, the resulting pellets were solubilised in 150 μl of buffer TED (20 mM Tris, 1 mM EDTA, 0.03% DDM [n-Dodecyl β-d-maltoside]) at 56 °C during 20 min, followed by incubation at 75 °C during 20 min. Samples were mixed with RNaseA (2 ng) and incubated at 37 °C; aliquots of 20 μL were taken at different time points: 0, 1, 5, 10, 30 and 60 min, immediately mixed with 5 μL of 6x Laemmli buffer, heated at 95 °C for 5 min and used for SDS-PAGE.
RNase treatment prior PTex
For PTex extractions shown in Fig. 1g, suspensions of 2–3 × 106 HEK293 cells/ml (±CL) were treated with 2000 U/ml benzonase (Merck, 70664) in the recommended buffer (50 mM Tris, 1 mM MgCL2, pH 8.0) during 1 h at 37 °C and 1000 rpm (ThermoMixer, Eppendorf). Untreated cells (±CL) were used as controls. PTex extraction was then performed as described above. After ethanol precipitation, pellets were directly solubilised in 40 μl Laemmli buffer (2×). SDS-PAGE and western blots were performed as indicated below.
In vitro transcription of Sxl-RBD4 target RNAs
The T7 promoter and a sxl-target DNA sequence 5′-GAT CCG GTC ATA GGT GTA AAA AAA GTC TCC ATT CCT ATA GTG AGT CGT ATT AA-3′ was cloned into pUC19 using the restriction enzymes BamHI and HindIII. The resulting plasmid was named pUC19-sxl-target. Templates for RNAs of 87 and 191 nt length were generated using DNA restriction fragments from the pUC19-sxl-target plasmid (HindIII + EcoRI − 87 bp; HindIII + PvuI − 191 bp). The 30 nt RNA was synthesised as described before67 by hybridising two complementary sequences containing the T7 RNA polymerase promoter 5′-TAA TAC GAC TCA CTA TAG-3′ and the template sequence 5′-GGT CAT AGG TGT AAA AAA ACT CTC CAT TCC TAT AGT GAG TCG TAT TAA-3′, followed by T7 run-off transcription. T7 RNA polymerase and restriction enzymes were purchased from New England Biolabs. Plasmids were purified using the NucleoBond Xtra Midi kit (Macherey-Nagel, 740410.100), DNA fragments by NucleoSpin Gel and PCR clean-up (Macherey-Nagel, 740609.50) and RNA by acidic phenol extraction14. RNA 5′-GAG UUU UUU UAC A-3′ (13 nt) was synthesised by Biomers (Ulm, Germany).
In vitro cross-linking assays
Sxl-RBD4 (40 μg) in 100 μl cross-linking buffer (CLB: 10 mM Tris pH 7.4, 50 mM KCl, 1 mM EDTA, 1 mM DDT) were mixed with in vitro-transcribed RNA (13, 30, 87 and 191 nt, 1.7–10 μM) harbouring one copy of the target motif 5′-GAG UUU UUU UAC A-3′, incubated at 4 °C for 30 min and cross-linked with 0.25 J/cm2 of UV-254 nm, on ice. Afterwards, 98% of each sample was used for PTex extraction, while 2% of the sample were kept as input control for SDS-PAGE and western blotting to detect Sxl-RBD4.
Quantification of PTex
In order to unbiasedly determine the overall yield of PTex, we prepared mRNA interactome capture (RIC) samples from +CL HEK2931 to serve as a only-clRNPs starting material. RIC samples from five biological replicates were used for PTex extraction. After ethanol precipitation, PTex samples were washed once with 5 ml of cold ethanol to remove traces of phenol which could interfere with the quantification, and resuspended in 20–50 μl of water. The absorbance of RIC and PTex samples were measured at λ280 nm and λ260/280 nm in a Nanodrop 2000. In addition, 2 and 45% of RIC and PTex samples, respectively, were digested with RNaseA (0.1 μg/μl) at 37 °C during 40 min. Intact and digested samples were loaded in Bis-Tris-MOPS gels 4–12% (NuPage, Invitrogen), transferred to nitrocellulose membranes and blotted to detect HuR (see below). A selection of western blot images generated during this study (Supplementary Figs 13,14) were used to calculate the performance of PTex in terms of yield and specific clRNP enrichment by densitometry analysis using ImageJ68.
Western blotting was performed following standard techniques. Samples were electrophoresed on SDS-PAGE gradient gels 4–20% (TGX stain free, BioRad) or Bis-Tris 4–12% (NuPAGE, Invitrogen) and proteins transferred onto nitrocellulose membranes 0.2 μm (BioRad). Membranes were blocked during 30 min with PBST-M (10 mM phosphate, 2.7 mM potassium chloride, 137 mM sodium chloride, pH 7.4 0.1% tween 20 (Sigma), 5% milk) and incubated with 0.1–1.0 μg/ml of the respective antibody overnight at 4 °C (or 2 h room temperature). Primary antibodies targeted the proteins HuR (1:1000, Proteintech, 11910–1-AP), ABCF2 (1:1000, Proteintech, 10226-1-AP), CCT7 (1:1000, Proteintech, 15994-1-AP), FUS (1:1000, abcam, ab124923), GAPDH (1:1000, Proteintech, 10494-1-AP), alpha-enolase (ENO1, 1:1000, Proteintech, 11204-1-AP), PTBP1 (1:1000, abcam, ab133734), PABPC1 (1:1000, Proteintech, 10970-1-AP), ACTB (1:1000, Proteintech, 66009-1-Ig), Histone H3 (1:1000, abcam, ab21054), FLAG-tag (1:5000, Sigma, A8592), or Sxl-RBD4 (DHSB, anti-Sxl hybridome culture supernatant M114, 1:20). Monoclonal mouse anti-Sxl antibodies (M18 and M114) were developed by P. Schedl and obtained from the Developmental Studies Hybridoma Bank (DHSB, created by the NICHD of the NIH and maintained at The University of Iowa, Department of Biology, Iowa City, IA52242). Antibody binding was detected using anti-mouseHRP (1:5000, Proteintech, SA00001-1), anti-mouse-AlexaFluor680 (1:2000, Invitrogen, A32729) anti-rabbitHRP (1:2500, Proteintech, SA00001-2), or anti-rabbit-AlexaFluor488 (1:2000, Invitrogen, A32731) and Clarity ECL Western Blotting Substrate for chemiluminescence in a ChemiDocMP imaging system (BioRad).
Immunoprecipitation and PNK assay
Immunoprecipitation of bacterial FLAG-tagged proteins and radioactive labelling of RNA by PNK was performed as described:49 Cell pellets were resuspended in 800 µl NP-T buffer (50 mM NaH2 PO4, 300 mM NaCl, 0.05% Tween, pH 8.0) together with 1 ml glass beads (0.1 mm). Cells were lysed by shaking at 30 Hz for 15 min at 4 °C and centrifuged for 15 min at 16,000×g and 4 °C. Cell lysates were transferred to new tubes and centrifuged for 15 min at 16,000×g and 4 °C. The cleared lysates were mixed with one volume of NP-T buffer with 8 M urea, incubated for 5 min at 65 °C while at 900 rpm and diluted 1:10 in ice-cold NP-T buffer. Anti-FLAG magnetic beads (Sigma) were washed three times in NP-T buffer (30 µl 50% bead suspension was used for a lysate from 100 ml bacterial culture), added to the lysate, and the mixture was rotated for 1 h at 4 °C. Beads were collected by centrifugation at 800×g, resuspended in 1 ml NP-T buffer, transferred to new tubes, and washed twice with high-salt buffer (50 mM NaH2 PO4, 1 M NaCl, 0.05% Tween, pH 8.0) and twice with NP-T buffer. Beads were resuspended in 100-µl NP-T buffer containing 1 mM MgCl2 and 2.5 U benzonase nuclease (Sigma) and incubated for 10 min at 37 °C while shaking at 800 rpm, followed by 2 min incubation on ice.
After one wash with high-salt buffer and two washes with CIP buffer (100 mM NaCl, 50 mM Tris-HCl pH 7.4, 10 mM MgCl2), the beads were resuspended in 100 µl CIP buffer with 10 units of calf intestinal alkaline phosphatase (NEB) and incubated for 30 min at 37 °C while shaking at 800 rpm. This was followed by one wash with high-salt buffer and two washes with PNK buffer (50 mM Tris-HCl pH 7.4, 10 mM MgCl2, 0.1 mM spermidine). Next, beads were resuspended in 100 µl PNK buffer and 1 U T4 PNK (0.1 U/µl, NEB) and 5.5 µCi γ 32P-ATP were added and incubated at 37 °C for 30 min. The beads were then washed twice in PNK buffer and resuspended in 50 µl of a 2× denaturing gel loading buffer (Invitrogen). 30 microliter samples were then analysed on a NuPAGE Bis-Tris gel (Invitrogen) and radioactive signal was detected in a Life Science FLA-5100 imaging system (Fujifilm).
Recombinant Sxl-RBD4 protein (Sxl amino acids 122–301) was purified essentially as described before17. In brief, after IPTG induction for 4 h at 23 °C in E. coli (BL21Star [Invitrogen] transformed with the Rosetta 2 plasmid [Merck]), cells were lysed in a buffer containing 20 mM Tris-HCl pH 7.5, 1 M NaCl, 0.2 mM EDTA, 1 mM DTT, cOmplete Protease Inhibitor Cocktail [Roche] followed by centrifugation for 20 min at 12,000×g. The cleared lysate was then subjected to GSH-affinity chromatography using an ÄKTA FPLC system. Bound protein was eluted in a buffer containing 100 mM HEPES/KOH pH 8.0, 50 mM glutathione, 50 mM KCl and 1 mM DTT. Fractions containing the protein were supplemented with 3 C protease and dialysed overnight against IEX buffer (20 mM HEPES/KOH pH 8.0, 50 mM KCl, 10% Glycerol, 0.2 mM EDTA, 0.01% NP-40) followed by ion exchange chromatography using a MonoS column. Fractions containing the pure protein were pooled, dialysed against storage buffer (20 mM HEPES/KOH pH 8.0, 20% Glycerol, 0.2 mM EDTA, 0.01% NP-40, 1 mM DTT) and stored at 80 °C.
MS sample preparation
Human and bacterial cells were cultivated as described above and pellets used in PTex protocol. A minor fraction of initial input material was lyzed and proteins denatured in 1% SDS and 0.1 M DTT Phosphate Buffer Solution (PBS) by boiling for 10 min at 95 °C. After cooling, samples were treated with Benzonase for 30 min at 37 °C, spun down to remove cell debris and supernatant containing cellular proteins transferred to fresh tubes. Remaining input material was used for RBPs enrichment with PTex as described above. After PTex, RBPs were precipitated in 90% ethanol solution at −20 °C and 30 min centrifugation at 20,000×g at 4 °C. Protein pellets were resuspended in 2 M urea in 50 mM ammonium bicarbonate (ABC) buffer and Benzonase was added for 30 min at 37 °C to digest RNA. Proteins were precipitated with methanol-chloroform extraction69 and resuspended in 8 M urea and 0.1 M Tris pH 8 solution. Proteins were reduced with 10 mM DTT at room temperature for 30 min and alkylated with 55 mM iodoacetamide at room temperature for 30 min in the dark. Proteins were first digested by lysyl endopeptidase (LysC) (Wako) at a LysC-to-protein ratio of 1:50 (w/w) at room temperature for 3 h, diluted to 2 M final concentration of urea with 50 mM ammonium bicarbonate, and further digested with trypsin (Promega) at a trypsin-to-protein ratio of 1:50 (w/w) under constant agitation at room temperature for 16 h. Peptides were desalted with C18 Stage Tips70 prior to LC-MS/MS analysis. Peptides were separated on a 2 m long monolithic column (100 µm ID, MonoCap C18 High Resolution 2000 [GL Sciences] kindly provided by Dr. Yasushi Ishihama [Kyoto University]), or on an in-house packed C18 15 cm microcolumns (75 µm ID, packed with ReproSil-Pur C18-AQ 3-µm resin, Dr. Maisch GmbH). Peptides were eluted from columns with an increasing acetonitrile concentration gradient of 5–50% at a flow rate of 200 nl/min for 6 h (long column) or 300 nL/min for 2 h (short column). Q Exactive Plus instrument (Thermo Fisher Scientific) was operated in the data dependent mode with a full scan in the Orbitrap followed by top 10 MS/MS scans using higher-energy collision dissociation (HCD). Full scans were performed with a resolution of 70,000, a target value of 3 × 106 ions and a maximum injection time of 20 ms. The MS/MS scans were performed with a 17,500 resolution, a 1 × 106 target value and a 60 ms maximum injection time. Isolation window was set to 2 and normalised collision energy was 26. Ions with an unassigned charge state and singly charged ions were rejected.
All raw files were analysed with MaxQuant software (v18.104.22.168)71 using the label-free quantification (LFQ) algorithm72 with default parameters and match between runs option on. Search parameters included minimum peptide length of 7, two missed cleavage sites, protease specificity: trypsin (allowing also for cuts after proline), cysteine carbamidomethyl fixed modification and variable modifications including methionine oxidation and protein N-terminal acetylation. The mass tolerance for precursor ions was 6 ppm and the mass tolerance for fragment ions was 20 ppm. Database search was performed against the human reference proteome (UNIPROT, downloaded in October 2014) or the Salmonella Typhimurium reference proteome (UNIPROT, downloaded in August 2017) with common serum and enzyme contaminants sequences included. A minimum peptide count required for protein quantification was set to two. False discovery rate (FDR) was set to 1% at peptide spectrum match (PSM) and protein levels.
Bioinformatic analysis of PTex-purified proteins (HEK293)
The complete analysis is available as a supplementary R notebook (Supplementary Data 8). In short, we used LFQ MS intensities normalised to trypsin (which is constant in all samples). Potential contaminants, reverse and peptides only identified by modification were excluded from analysis. Fold changes were calculated by subtraction of the log2 values of LFQ intensity for proteins from UV cross-linked samples and non-cross-linked samples. Only proteins which were found in all replicates were processed further. Enrichment (CL/−CL) was calculated as described before:3,59 P-values were calculated from an Ebayes moderated t-test using the limma package73 followed by Benjamini–Hochberg False Discovery Rate (FDR) correction. Only proteins with an adjusted p-value of 0.01 or smaller in all 3 cross-linking intensities were considered being enriched. GO analysis was performed using PANTHER V.1174. Domain enrichment was done using DAVID75 searching the SMART76 database.
Bioinformatic analysis of PTex-purified proteins (Salmonella)
The complete analysis is available as supplementary R notebook (Supplementary Data 9). We used iBAQ-normalised values for the Salmonella analysis. Potential contaminants, reverse peptides and peptides only identified by modification were excluded from analysis. Fold changes were calculated by subtraction of the log2 values of iBAQ intensity for proteins from UV cross-linked samples and non-cross-linked samples. Only proteins which were found in both replicates were taken into account (258 proteins; 172 with a log2 fold-change >0). Domain and GO terms were analysed using DAVID75.
RNA-Seq sample preparation
Cross-linked PTex samples and non-cross-linked control (-CL, 0.015, 0.15, 1.5 J/cm2) were proteinase K digested (1 h, 56 °C) and the RNA recovered by acidic phenol extraction14 using phase lock gel tubes (5Prime, 2302830). Libraries were created according to the TruSeq Stranded Total RNA LT protocol (Ilumina, 15032612) with the modification that we skipped the rRNA depletion step. We used adaptors AR002,4,5-7,12,13,16,18,19. DNA concentration was determined by Qubit 3.0 Fluorometer (Life Technologies) and the quality of libraries assessed by a Bioanalyzer 2100, DNA 1000 Chip Kit (Agilent, 5067–1504). Sequencing was performed on a Illumina HiSeq4000. Note that we sequenced each condition in triplicates with the exception of input 0.015 J/cm2 for which we have duplicates.
Bioinformatic analysis of PTex-purified RNA
RNA-Seq data were quality controlled using fastqc (v0.11.2). We then mapped obtained reads against a single copies of human rRNA and tRNA sequences using bowtie2 (v2.2.6) with the following command line:
bowtie2 -no-unal -un $LIBRARY_rRNA_not_aligned_reads.fastq -al $LIBRARY_rRNA_aligned_reads.fastq -x rRNA_db -U, $LIBRARY.fastq > $LIBRARY_rRNA.sam 2», $LIBRARY_rRNA_alignment_stats.txt
We then used the remaining reads and mapped reads to the human genome (GRCh38.p12) and the corresponding comprehensive gene annotation file (gencode.v28.chr_patch_hapl_scaff.annotation.gtf) from GENCODE using the STAR aligner (v020201) with the following command line:
STAR -genomeDir PATH/TO/index -readFilesIn $LIBRARY_rRNA_not_aligned_reads.fastq -quantMode GeneCounts
After aligning the remaining reads with STAR, samtools was used to annotate each read in the bam files with mutation information. Using the SAM.py parser from the pyCRAC suite25, chromosomal locations of substitutions and deletions were extracted and counted. Only mutations that were unique to the UV-treated samples were considered. To normalise the data for sequencing depth, for each dataset the counts for substitutions and deletions were divided by the total number of mapped nucleotides, which provided an indication of mutation frequencies. To map the distribution of deletions and substitutions around AUG and Stop codons, CDS coordinates from the gencode.v28.chr_patch_hapl_scaff.annotation.gtf annotation files were extracted. Tables containing counts and chromosomal positions for each substitution and deletion were converted into gene transfer format (GTF) files using the GTF2 and NGSFormatWriters classes from the pyCRAC package. Subsequently, pyBinCollector from the pyCRAC package was used to map the distribution of substitutions and deletions around start and stop codons of protein-coding genes. The following command lines were used:
pyBinCollector.py -f mutations.gtf -gtf, annotationfile_CDS_coordinates.gtf -s 5end -a protein_coding -normalise -v -o dist_around_AUG.txt
pyBinCollector.py -f mutations.gtf -gtf, annotationfile_CDS_coordinates.gtf -s 3end -a protein_coding-normalise-v-o dist_around_STOP.txt
For the feature counts, count tables generated by STAR were used in conjunction with the gencode.v28.chr_patch_hapl_scaff.annotation.gtf annotation file.
PAR-CLIP and pCLIP
We performed the PAR-CLIP protocol as described:11 HEK293 cells stably expressing FLAG-tagged HuR (ELAVL1), were grown until 90% confluence. The last 16 h of incubation, 200 mM 4SU was added. Living cells were irradiated with 0.15 J/cm2 365 nm UV light, snap-frozen on dry ice and stored at −80 °C until use. Cells were collected on different days, representing biological replicates. Cells (∼1.2 × 108 cells/replicate) were lysed on ice for 10 min with 3 ml lysis buffer (50 mM Tris-Cl pH 7.5 (Life Tech., 15567027)), 100 mM NaCl (Life Tech. AM9760G), 1% (v/v) Nonidet P40 substitute (Sigma 74385), 0.5% (v/v) Sodium deoxycholate (AppliChem No. A1531) containing 0.04 U/ml RNasin (Promega, N2515) and 2x Complete Protease Inhibitor (Roche, 11697498001) and centrifuged 20,000×g, 10 min, 4 °C. Cleared lysates (1.5 ml/replicate) were digested with 8 U/ml TURBO DNase (ThermoFisher, AM2238) and 2 U/μl RNase I (ThermoFisher, AM2294) at 37 °C for 4 min (replicate 1) or 3 min 15 sec (replicates 2 and 3). FLAG-tagged HuR was immunoprecipitated with 10 μg of anti-FLAG monoclonal antibody (Sigma, F1804) bound to 40 μl of Protein G Dynabeads (Life Tech, 10004D). After extensive washes with high-salt buffer (50 mM Tris-HCL pH 7.5, 1 mM EDTA, 1% Nonidet P40, 0.1% (v/v) SDS, 1 M NaCl) beads were incubated with 1 U/μl of T4 polynucleotide kinase (T4-PNK, NEB) and 0.5 μCi/μl of 32P-ATP. After radiolabelling, samples were split into 4 tubes and underwent three versions of CLIP (classic PAR-CLIP, PAR-CLIP on-beads and pCLIP).
In classic PAR-CLIP, clHuR-RNA complexes were resolved by 4–12% Nu-PAGE MOPS (Invitrogen) transferred to nitrocellulose and excised at a defined size-range (50–60 kDa). Proteins were digested from the membrane with proteinase K and the RNA recovered by acidic phenol/chloroform extraction and ethanol precipitation. Recovered RNA was ligated with 3′ adaptor (5′-App-NNN NTG GAA TTC TCG GGT GCC AAG G-3′InvdT) gel-slice isolated, ligated with the 5′ adaptor (5′-GUU CAG AGU UCU ACA GUC CGA CGA UCN NNN-3′) and purified again from PAGE-Urea gels by elution and ethanol precipitation11.
As an alternative, we performed PAR-CLIP on-beads. Here, the ligation of the 3′/5′ adaptors can be achieved directly on the beads used in the affinity capture of the selected clRNP21,22. On-beads adaptors ligation was done by incubating the FLAG-clHuR-RNA beads with the 3′ adaptor in the presence of Rnl2(1–249)K227Q ligase and PEG-8000 overnight at 4 °C. After washes, the 5′ adaptor was ligated using the Rnl1 enzyme, 2 h at 37 °C as described in ref. 22. Followed by protein electrophoresis (4–12% Nu-PAGE MOPS, Invitrogen), blotting to nitrocellulose membranes and band excision at the defined size-range (50–60 kDa). Proteins were digested from the membrane with proteinase K. RNA was then recovered by acidic phenol/chloroform extraction and ethanol precipitation.
pCLIP beads capturing clHuR-RNA complexes were subjected to 3′/ 5′ adaptors ligation as above. Immediately after the 5′ adaptor ligation beads were magnetically captured and resuspended in 600 μl solution D. After a short denaturation (95 °C during 10 min), beads were separated with a magnet and the clHuR-RNA complexes recovered in the supernatant were subjected to the last two steps of the PTex protocol (20 min). PTex recovered interphases were precipitated with 9 volumes of ethanol on dry ice during 30 min, followed by 30 min centrifuging at 20,000×g, 4 °C. Ethanol-precipitated pellets were digested with proteinase K and RNA isolated by phenol/chloroform extraction.
Library preparation and RNA-Seq
RNAs obtained from the three PAR-CLIP procedures (classic, on-beads and pCLIP) were reverse-transcribed into cDNA with the reverse transcription primer 5′-GCC TTG GCA CCC GAG AAT TCC A-3′ and the minimal PCR cycles were determined for each case. cDNA libraries were created by PCR using the forward primer 5′- AAT GAT ACG GCG ACC ACC GAG ATC TAC ACG TTC AGA GTT CTA CAG TCC GA-3′ and the Illumina adaptor index RPI 1–6, 8,10–11. Bands obtained at 150 bp were excised from 2% agarose gels and purified using the Zymo-clean Gel Recovery Kit (Zymo, D4002). DNA concentration and library quality was determined by Qubit Fluorometer ds-DNA HS assay (Life Tech, Q32854) and BioAnalizer DNA HS Kit (Agilent 2100 Bioanalyzer; Agilent, 5067–4626). Libraries were sequenced in a NextSeq 500.
CLIP data processing
The PAR-CLIP data was processed and annotated using the PARpipe pipeline (https://github.com/ohlerlab/PARpipe) around the PAR-CLIP data tailored peak caller PARalyzer60, with one modification. In brief, adaptor sequences were trimmed retaining the four randomised adaptor nucleotides on both read ends included during adaptor ligation to serve as unique molecular identifiers during PCR-duplicate removal (read collapsing). Differences in the numbers of uniquely aligning reads were balanced by random subsampling prior to PARalyzer cluster identification. Annotation of identified binding sites (cluster) was simplified by grouping closely related sub-annotation categories (Supplementary Data 6).
De novo motif discovery
For de novo motif finding we used Zagros with default settings including RNA secondary structure information. As the majority of HuR cluster resided in intronic and 3′UTR regions (70–80%, Supplementary Data 7) according to its reported functions, we use intronic and 3′utr cluster sequences as input. For DREME motif analysis, we used all PARalyzer-derived cluster sequences and ran DREME with default settings against shuffled background sequences allowing only sense strand motif search.
For depicting spatial preferences for mRNA binding, we selected genes previously used for RNA classification based on processing and turnover dynamics77 (n = 15,120) present in GENCODE v19. To select transcripts, we ran RSEM78 and retained transcripts with TPM>3. For each gene, we selected the transcript isoform with the highest isoform percentage or chose one randomly in case of ties (n = 8298). The list of selected transcript isoforms was used to calculate the median 5′UTR, CDS and 3′UTR length proportions (5′UTR = 0.06, CDS = 0.53, 3′UTR = 0.41) using R Bioconductor packages GenomicFeatures and GenomicRanges79. For regions post annotated transcription ends and splice sites we chose windows of fixed sizes (TES 500 nt, 5′ and 3′ splice sites 250 nt each). We generated coverage tracks from the PARalyzer output alignment files and intersected those with the filtered transcripts. Each annotation category was binned according to its relative length and the coverage averaged within each bin. For intronic coverage, we averaged across all introns per gene, given a minimal intron length of 500 nt. All bins were stitched to one continuous track per transcript (n = 6632 intron containing transcripts). Each library bam file was filtered to retain only PARalyzer cluster overlapping alignments. We required transcripts to have a minimal coverage maximum of >2. For each transcript we scaled the binned coverage dividing by its maximal coverage (min-to-1 scaling) to emphasise on spatial patterns independent from transcript expression levels. Next, we split transcript coverage in two parts, separating 5′UTR to TES regions and intronic regions. To generate the scaled meta coverage across all targeted transcripts per RBP, we used the heatMeta function from the Genomation package80. For the 5′UTR to TES part, we scaled each RBP meta-coverage track independent of other libraries. For intronic sequences, we scaled each sample relative to all other sample. Finally, we clustered the meta-coverage tracks using ward.D clustering with euclidean distance.
Genome browser visualisations
PAR-CLIP alignments were visualised using Gviz81.
Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.
RNA-seq data have been submitted to the NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) under accession numbers GSE113628 for PAR-CLIP RNA-seq and GSE113655 for PTex RNA-seq. Read count data for RNA classes (Supplementary Fig. 20 are available in Supplementary Data 5). The proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD009571. MaxQuant analysed-proteomics data (protein groups for HEK293 and Salmonella) are available in Supplementary Data 8 and 9, respectively. A reporting summary for this Article is available as a Supplementary Information file. All other data supporting the findings of this study are available from the corresponding author upon reasonable request.
Journal peer review information: Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We wish to thank Florian Heyd and Marco Preussner for providing mouse brain samples, Jörg Vogel for providing Hfq-FLAG-tagged Salmonella, Stuart McKellar for help with cross-linking, and Ira Iosub and Julie Bohl for assistance. We are grateful to Maja Köhn and Matthias Hentze for discussions and to Markus Landthaler for critically reading the manuscript. Work in the Beckmann lab is supported by the German Research Foundation (DFG; IRTG 2290 and ZUK 75/1 Project 0190–854599). Work in the Medenbach lab is supported by the DFG (SFB 960/2, B11), the Bavarian State Ministry for Education, Science and the Arts (Bavarian Research Network for Molecular Biosystems, BioSysNet), and the German Federal Ministry of Education and Research (BMBF, 01ZX1401D). The Granneman lab is funded by a Medical Research Council non-clinical Senior Research fellowship (MR/R008205/1). The Ohler lab is funded by NIH TRA R01 GM104962. E.C.U. is supported by the Joachim Herz Foundation and an EMBO Short-Term Fellowship No 7511.