Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# easyCLIP analysis of RNA-protein interactions incorporating absolute quantification

## Abstract

Quantitative criteria to identify proteins as RNA-binding proteins (RBPs) are presently lacking, as are criteria to define RBP target RNAs. Here, we develop an ultraviolet (UV) cross-linking immunoprecipitation (CLIP)-sequencing method, easyCLIP. easyCLIP provides absolute cross-link rates, as well as increased simplicity, efficiency, and capacity to visualize RNA libraries during sequencing library preparation. Measurement of >200 independent cross-link experiments across >35 proteins identifies an RNA cross-link rate threshold that distinguishes RBPs from non-RBPs and defines target RNAs as those with a complex frequency unlikely for a random protein. We apply easyCLIP to the 33 most recurrent cancer mutations across 28 RBPs, finding increased RNA binding per RBP molecule for KHDRBS2 R168C, A1CF E34K and PCBP1 L100P/Q cancer mutations. Quantitating RBP-RNA interactions can thus nominate proteins as RBPs and define the impact of specific disease-associated RBP mutations on RNA association.

## Introduction

Approaches to quantify protein–RNA cross-links on a per-molecule basis are not widely available, leading to confusion both as to what constitutes an RNA-binding protein (RBP) and to the quantitative impact of disease-associated RBP mutations. For example, only roughly half of the proteins either in the RBP census1 or with an RNA-binding Gene Ontology (GO) term are considered RBPs by both sources. Landmark proteomic efforts from multiple groups have identified many potential novel RBPs2,3,4,5,6,7; some, such as sequestosome-18, were subsequently verified and studied by cross-linking immunoprecipitation (CLIP), while the vast majority of which have not yet been evaluated by non-proteomic approaches9. Many important proteins studied in a different context have been categorized as also binding RNA, yet few or no experiments have been published on their functions in RNA binding. For example, proteins important to cancer, such as BRCA1, SMAD3-4, SPEN, CHD2, and JUN, have been categorized as RBPs, yet are not generally studied as such, raising the question as to whether they actually act in that role. Addressing this question for such proteins, and for additional potentially novel RBPs, has been hindered by the lack of a test that quantitates RNA interaction events per protein molecule to provide a global cutoff level of RNA binding to nominate a protein as an RBP.

Currently, there is no general method to estimate absolute RNA–protein interaction frequencies and a quantitative test is needed to assess whether any nonrandom interaction with an RNA exists. The frequencies of RNA–protein complexes, per-cell and per-interaction partner, would enable the fundamental characterization of RNA–protein interaction networks. Determining the targets of an RBP by conventional approaches, such as enrichment over negative control immunopurification or by clustering of cross-links10, are ultimately but indirectly determining if the absolute count of an RNA–protein complex in the cell is abnormally high. Defining RNA–protein interaction events per cell and per protein in absolute quantities, in contrast, may provide a framework for describing a global and widely reproducible view of RNA–protein interactions.

A number of RBPs are mutated in human cancers; however, the impact of such mutants on their association with RNA has not been quantitated. Most tumors have aberrant splicing without apparent mutational cause11,12, indicating that there must be unknown mutations within or affecting RNA metabolism pathways that are collectively common. There are widespread RBP expression changes in tumors, and the alternative splicing of tumor cells is predicted to affect cancer hallmarks, with some tumor types reverting to a more undifferentiated splicing pattern13. Many cancer-associated genes are potentially RBPs and some RBPs contain recurrent missense mutations. More generally, recurrent mutations that are not exceptionally frequent are of unknown significance14.

Here, we report a refinement of current CLIP protocols, termed easyCLIP. easyCLIP quantifies RNA cross-links per protein and provides visual confirmation of each step. easyCLIP enables the calculation of the distribution of cross-linking for the average protein and we propose a quantitative threshold for whether a protein is an RBP. Establishing a distribution for non-RBPs enables the definition of specific target RNAs for any RBP as those interactions with a frequency per protein are unlikely to occur with a randomly selected protein. easyCLIP is applied to the top 33 most frequent missense mutations across 28 RBPs, identifying quantitative changes in RNA binding in specific RBPs that are recurrently mutated in cancer. easyCLIP represents a method with built-in verifications that enables quantification of the number of RNA cross-links per protein in wild-type (WT) and disease-associated mutant RBPs.

## Results

### RNA-binding proteins associated with cancer

Two lists of RNA-binding proteins (GO terms and census) were compared with the COSMIC (Catalog of Somatic Mutations in Cancer) list of cancer-associated genes15 to identify 93 RBPs associated with cancer (Fig. 1a), of which 51 had no clear structured RNA-binding domain (RBD) (Fig. 1b). Proteins without an RBD did not have a common set of domains (Fig. 1c). Notable inclusions in this list that are not well-established direct RBPs include BRCA1, BARD1, SMAD2-4, SOX2, KMT2C, SPEN, CHD2, JUN, and EZH2. The Cancer Genome Atlas (TCGA) data16 was used to rank all recurrent missense mutations in RBPs by overall frequency (Fig. 1d), which recovered the top three most studied mutations in well-established RBPs in cancer as the top three most frequent (in SF3B1, U2AF1, and SRSF2), aside from the putative RBP SMAD4, supporting the usage of raw frequency as a basis for oncogenic potential. Statistical tests for cancer driver and tumor suppressor activities for the top 29 proteins gave a range of significant values14 (Fig. 1e). The tissue specificity of these missense mutations had an expected enrichment in tumor types with a higher single-nucleotide polymorphism rate, but showed variation (Fig. 1f). Of those proteins with recurrent missense mutations and RNA-binding GO terms, many are long-established direct binders, while others are included on the basis of interactome capture datasets2,3,6 (Fig. 1g). Further analysis was conducted on RBPs identified by phase separation methods4,5,7 and additional datasets (Supplementary Fig. 1). To understand the role of RBPs in cancer, a method was needed that (1) distinguished RBPs from non-RBPs, (2) reliably produced CLIP-seq data so that it could be applied to a large protein set, and (3) provided a general definition of nonrandom interactions.

### Library preparation by easyCLIP

To generate a simpler and faster way of producing CLIP-seq datasets, a method was developed using on-bead ligations17,18,19 of 3′ adapters (termed L3) and 5′ adapters (termed L5), each with a different fluorescent dye (Fig. 2a, b). After running a sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) gel and transferring to a membrane, single- and dual-ligated RNA were clearly visible (Fig. 2c and Supplementary Fig. 2). RNA was extracted from the membrane using proteinase K, purified using oligonucleotide(dT) beads to capture the poly(A) sequence on the L3 adapter, eluted, reverse transcribed, and input directly into PCR. Major differences from HITS-CLIP (high-throughput sequencing of RNA isolated by CLIP) include the usage of a chimeric DNA–RNA hybrid for efficient ligation, the purification of complexes from a membrane by oligo(dT), and the direct visualization of ligation efficiencies and finished libraries by infrared dyes (Supplementary Data 1 and please see Table 1 for how each step is verified). Since the only steps after the gel extraction are an oligonucleotide(dT) purification and reverse transcription (RT) before PCR, there are decreased opportunities for experimental failure after the diagnostic step of gel imaging. It is a caveat to all cross-linking-based studies that cross-linking is proportional to binding frequency through a poorly understood factor of complex cross-linking efficiency.

easyCLIP was benchmarked against eCLIP (enhanced CLIP) in the manner in which eCLIP was benchmarked against iCLIP (individual nucleotide resolution CLIP), using Rbfox2. easyCLIP was more efficient (Supplementary Fig. 3A), and Rbfox2 libraries were reproducible (Supplementary Figs. 36), including deletions from cross-linking (Supplementary Fig. 4), and correlated with Rbfox2 eCLIP (Supplementary Figs. 46), including matching the pattern seen with eCLIP at NDEL1 (Supplementary Fig. 4), indicating that easyCLIP captures similar information. easyCLIP was then used to generate data for 11 additional known RBPs, chosen as representatives (FBL, which associates with C/D-box small nucleolar RNA (snoRNA), and other noncoding RNA (ncRNA), hnRNP C), at random (CELF1, hnRNP D), or for their relevance to cancer (the others). Endogenous hnRNP C, RBFOX2, and FBL were immunopurified, PCBP1 was stably integrated outside its genomic locus, and the others were transiently transfected in a manner to generate low expression (“Methods”), usually below or equal to endogenous protein (Supplementary Fig. 7).

easyCLIP libraries produced high-quality data in each case (Fig. 2d–k and Supplementary Data 25). Directly entering sequences under the tallest peaks for all messenger RNA (mRNA)-binding RBPs into a de novo motif discovery program20 resulted in the top motifs being the expected motifs in all cases (Fig. 2e), indicating high easyCLIP signal-to-noise ratios. Results agreed with in vitro motif selection in all cases, and 9/10 had a top-3-enriched motif in >50% of peaks (Supplementary Fig. 8). Using the MACS2 peak calling algorithm on easyCLIP and ENCODE (Encyclopedia of DNA Elements) project eCLIP peaks generated favorable comparisons, with good motif coverage, peak numbers (Supplementary Fig. 9), and replicate consistency (Supplementary Fig. 10). Using enrichment over controls also recovered all motifs (Fig. 2f and Supplementary Fig. 8). Cross-linked RNA types matched expectations (Fig. 2g–h), except the transfer RNA (tRNA) binding by PCBP1, addressed below. Under false discovery rate (FDR) <10−4 vs random non-RBPs (discussed below), target RNA numbers (Fig. 2j) and the total number of unique mapped reads were typical for CLIP (Supplementary Fig. 3D, E). Analysis of deletions indicated that a similar level of cross-link position replicates reproducibility and cross-link-induced base deletions as other CLIP methods21 (Supplementary Fig. 10). Using deletions allows FBL cross-linking positions within snoRNAs to be visualized in detail (Fig. 2k), and matched previous reports22, suggesting that easyCLIP may offer an advantage over iCLIP/eCLIP-like methods for short RNAs, where reads with reverse transcriptase stop near the 3′ end are unmappable. Together, these results indicate easyCLIP effectively captures RNA-binding information.

### Fluorescence loss

To examine loss in adapter fluorescence from CLIP over the course of the method, antisense oligonucleotides to L5 and L3 were labelled with reciprocal dyes, hereafter termed αL5 and αL3, and used to shift their cognate adapter. Such antisense oligonucleotides shift the adapter molecules up in a native gel and produce bands with a 1:1 ratio of antisense and sense oligonucleotide (Fig. 3d). L5 and L3 were purified from proteinase K extract and RNAse digested to free adapters (Fig. 3e). 100% of L5 and L3 adapters were shifted (Fig. 3f) and the method was applied to RNAse-digested CLIP oligonucleotides (Fig. 3g). By comparing the ratio of αL5 to L5 for fresh L5 vs L5 extracted from the nitrocellulose membrane in CLIP, the L5 fluorescence loss from CLIP could be determined to consistently be only ~20% (Fig. 3h).

### Cancer-associated RBPs

The most frequent missense mutations in potential RBPs in cancer were identified using TCGA data16 (Fig. 1d). Cross-link rates were determined for 33 recurrent mutations across 28 known or potential RBPs, tagged and transiently expressed in 293T cells (Fig. 5j, k and Supplementary Figs. 7 and 2122), plus additional proteins. All expected direct RBPs cross-linked above 0.1% (Fig. 5j, k). BRCA1, CRNKL1, and BCLAF1 would pass our cutoffs, but too small an amount of protein could be purified for confident assignment, and BRCA1 libraries were largely empty, indicative of an artifact. As a group, this larger dataset supports a 0.1% cutoff for RBPs (Fig. 5j) and the type of RBD did not have a strong effect on cross-link rate (Fig. 5l). Four recurrent mutations had effects on cross-linking: L100P and L100Q of PCBP1, A1CFE34K, and KHDRBS2R168C (Fig. 5m, n). Interestingly, all four demonstrated increases in binding, consistent with these recurrent mutations potentially being gain of function. We are not aware of a previous example of a cancer-associated mutation increasing protein binding to RNA. To study this, we integrated several proteins into the genome (Supplementary Fig. 17); KHDRBS2R168C in melanoma cells and A1CFE34K integrated into HepG2 showed even larger increases (~2-fold) in cross-link rate vs WT controls (Fig. 5o). Integrated FUBP1R429C and DDX3XR528C showed decreased cross-link rates vs WT (Fig. 5o and Supplementary Fig. 17). Increased cross-linking for mutant KHDRBS2 and A1CF was confirmed in vitro, with mutant forms binding RNA ≥2-fold better than WT (Fig. 5p, q). PCBP1L100P aggregated out of solution upon cleavage of the purification tag (Supplementary Fig. 16D). These data indicate that recurrent cancer mutations in specific RBPs may be associated with differences in RNA cross-link rates to their corresponding associated protein.

### Defining specific interactions of RBPs and non-RBPs

One of the goals of this study was to enable target RNAs to be defined for a protein of interest as those interactions with a frequency per protein or per cross-link unlikely to occur with a randomly selected protein. To address this issue, easyCLIP libraries for ten of the random non-RBPs were prepared. Using the resulting distribution of RNA interactions for random proteins, it is possible to directly estimate how unusual any RNA–protein interaction pair is. This method was first applied to frequencies per cross-link (per read) and RNA biotypes. RBPs tended to have a significant reduction in the fraction of reads mapping to ribosomal RNA (rRNA), and an increase in reads mapping to mRNA, relative to non-RBPs (Fig. 6a). Assuming that the interactions of non-RBPs are largely random, it was expected that RBPs contain similar random interactions, plus their evolved interactions. That is, the depletion or rRNA and small nuclear RNA (snRNA) in Fig. 6a is probably an artifact, not a representation of RBPs avoiding rRNA or snRNA. A more realistic view might be to use the cross-link rate to convert to cross-links per protein (Fig. 6b). This view appears more accurate in several regards. First, RBPs are not generally depleted for rRNA or snRNA interactions. Second, effect sizes are generally larger and significances higher—RBPs are more different from non-RBPs. Third, FBL has a more expected distribution, targeting snRNA and showing a greater enrichment for snoRNA and rRNA. Fourth, a number of other RBPs show more correct biological distributions (RPL5 binds rRNA, FUBP1/CELF1/RBFOX1 enrich for snRNA rather than being depleted).

Target RNAs were next identified as those bound per read at an unusually high rate. The success of this method is supported by the identification of the expected motif for all RBPs as the top motif (Fig. 2f), and target RNA types fit expectations (Fig. 2j). For example, FBL targeted snoRNA, hnRNPs targeted mRNA, and non-RBPs had few target RNAs. The sparse data for non-RBPs limits this method when applied to mRNA. Finally, target RNAs were identified as those bound per protein at an unusually high rate. Most mRNA-binding proteins appear to target most expressed mRNA at some level (Fig. 6c). The rate of cross-linking per protein was plotted for mRNAs (Fig. 6d, left), snoRNAs (middle), or tRNA (right), which suggested some fundamental results. First, the distribution of cross-linking across mRNAs, in reads per million, is similar between RBPs and non-RBPs (top left), but RBPs have more frequent mRNA partners per protein. snoRNA presents a different picture (middle). If looked at only by reads per million, it would seem that either randomly selected proteins target snoRNA or else RBPs somehow avoid it. Per protein, however, mRNA-binding RBPs and non-RBPs are equally likely to contact snoRNA—consistent with only FBL having specific interactions with snoRNA (bottom middle). This helps explain the tRNA binding by PCBP1 (Fig. 2g). Like snoRNAs, tRNAs make up a disproportionate share of the libraries of non-RBPs (top right), but per protein all RBPs and non-RBPs have the same distribution (bottom right). The distribution of tRNA binding by PCBP1 is actually just that of a non-RBP, consistent with a lack of evolved interaction with tRNA. Taken together, these data indicate that easyCLIP enables identification of target RNAs for RBPs of interest.

### Recurrent cancer mutations tend to destabilize RBPs

A puzzling result of RBP cancer mutants was that, of the RBPs for which relative expression levels were determined, mutants were less expressed (P < 0.05, two-sided t test, Fig. 6e, f). The apparent destabilization effect was particularly strong for RARS2R6C (Fig. 6e), a mutation that causes mitochondrial encephalopathy29 and is found in cancer (Fig. 1f). The role that such potential RBP destabilization may play in cancer remains to be explored.

### Recurrent cancer mutations alter RNA–protein interactions

easyCLIP libraries were sequenced for normal and mutant forms of PCBP1, A1CF, KHDRBS2, FUBP1, RPL5, RARS2, and SMAD4. The effect of the recurrent missense mutations on cross-linking to individual RNAs (Fig. 6g) and RNA categories (Fig. 6h) was evaluated, with PCBP1 discussed separately (Fig. 7). On a per-read basis, differences were generally slight between RNA categories (aside from PCBP1), but there were numerous differences in individual RNAs (Fig. 6g and Supplementary Data 7). While the mutant vs WT changes induced between proteins were not generally correlated (highest R2 = 0.02), the lncRNA PURPL (p53 upregulated regulator of p53 levels30) was one of the five most significant changes for both KHDRBS2 and A1CF (Fig. 6g, i, j).

### L100 mutations of PCBP1

PCBP1 has recurrent missense mutations in L100, commonly L100Q, seen most frequently in colon cancer16,31 (Figs. 1f and 7a). PCBP1 is highly multifunctional beyond RNA binding32,33 and was expected to cross-link less than the average RBP, which was true (Fig. 7b–d). To test if cross-linking was specific, GxxG loop mutations were introduced in all KH domains of PCBP1, which remove the affinity of KH domains for RNA34. “GxxG PCBP1” no longer cross-linked (<0.01%, Fig. 7b, c). The first and second KH domains of the closely related PCBP2 form an intramolecular dimer, in which the β1 and α3 elements of both KH1 and KH2 bury hydrophobic residues against the other domain35. L100, in β1 of KH2, is part of this dimerization surface35, suggesting that the L100 mutants might alter conformation to impair association with RNA. Surprisingly, the opposite effect was observed: PCBP1L100P/Q was threefold more cross-linked to RNA (Fig. 7b, c). PCBP1L100P/Q was dramatically destabilized (Fig. 7e and Supplementary Fig. 23A). Expressing PCBP1 from a vector containing an upstream open-reading frame (ORF) (uORF) that lowered expression to below that of PCBP1L100P/Q (Fig. 7b, c) did not equalize the cross-link rate to the level observed with the L100 mutants (Fig. 7b, c), ruling out expression levels as the cause of differential RNA binding. Interestingly, when the entire KH domain-containing L100 (KH2) was removed, cross-linking was approximately the same as WT (Fig. 7b–d), yet ∆KH2 PCBP1 was also destabilized (Fig. 7e and Supplementary Fig. 23A). easyCLIP thus demonstrated that the cancer-associated PCBP1L100P/Q quantitatively increased RNA binding.

easyCLIP-seq data demonstrated that, in addition to enhancing RNA binding compared to WT, mutant PCBP1L100P/Q bound different types of RNAs. For example, PCBP1L100P/Q had a much smaller fraction of reads mapping to snRNA than WT (Fig. 7f), and PCBP1L100P/Q greatly increased its association with mRNA per protein (Fig. 7g). It was therefore hypothesized that PCBP1L100P/Q might be more cytoplasmic than WT PCBP1. This was confirmed by microscopy (Supplementary Fig. 23b and Fig. 7h). The quantifications done by easyCLIP enabled three different views of RNA–protein interactions (Fig. 7i–l). Cross-linking to snRNA by PCBP1L100P is reduced per protein, but on a per-cell basis collapses almost completely (Fig. 7g, k). Although mutant PCBP1 interacts more often with mRNA per protein, per cell it is more similar (Fig. 7g, i, j). Altogether, these data highlight the complexity of RNA–protein interactions, and the importance of analyzing CLIP-seq data beyond traditional read count distributions.

To further explore the impacts of mutant RBPs in cancer, we analyzed the transcriptomic effects of recurrent missense mutations in PCBP1 and KHDRBS2 (Supplementary Fig. 24 and Supplementary Data 8). Both mutants led to >1000 RNAs changing levels at least 1.4-fold at FDR < 0.05 (Supplementary Fig. 24E, F). PCBP1 downregulation promotes tumorigenesis by decreased expression of CDKN1A36. Interestingly, PCBP1L100Q bound more to the CDKN1A 3′-untranslated region (UTR) than WT PCBP1, and CDKN1A was the eighth most significantly downregulated RNA in PCBP1L100Q-expressing cells (Fig. 7i, m). This could not be explained by simple differences in PCBP1 protein abundance (Supplementary Fig. 24A). RNAs significantly upregulated by PCBP1L100Q were enriched for GO terms involving cell division and extracellular matrix organization, while downregulated RNAs were highly enriched for genes involved in cell–cell adhesion (Fig. 7n). RNAs with a 3′-UTR PCBP1 peak location were more likely to be destabilized in response to PCBP1L100Q expression (Supplementary Fig. 25A). Within the RNAs whose levels were decreased with PCBP1L100Q that also display increased PCBP1L100Q association (Supplementary Fig. 25B) were 32 cell adhesion genes, nearly all having 3′-UTR peaks with increased PCBP1L100Q binding (Supplementary Fig. 25C, D). These results are consistent with a possible model in which L100P/Q might contribute to tumorigenesis via increased binding to the 3′-UTR of the RNAs that encode specific cell adhesion proteins, causing their destabilization.

## Discussion

Here, we present easyCLIP as a refinement of the CLIP method that may be useful for quantifying RNA per-protein cross-link rates and for production of CLIP-seq libraries. It introduces a quantitative non-isotopic approach for absolute quantification designed to facilitate comparison between CLIP datasets obtained in any laboratory and to provide direct visualization of the success of library preparation steps. easyCLIP is designed to help address a challenge in conventional CLIP protocols in which a time-consuming workflow is performed without ready availability of visual quality verification at key steps. Like all CLIP approaches, easyCLIP relies on UV cross-linking as a proxy for in vivo associations10. easyCLIP allows multiplexing based on two adapters and determines ligation efficiency, two features that streamline the workflow and enable its quantitative robustness. Finally, on a practical level, easyCLIP’s low technical experimental failure rate and non-isotopic labelling features are designed to increase the general usability of CLIP-seq.

Limitations of the method include the fact that some proteins may shift their RNA-binding properties under different conditions37,38, complementary DNAs terminating with an RT break may be lost, ncRNAs like tRNA may have different cross-link/ligation efficiencies, and that variations in complex cross-link efficiency mean that 2% vs 5% cross-link rates in two different RBPs might both reflect 100% occupancy. Cross-link numbers from transient expression may also differ from those obtained by other expression methods. It would be difficult to compare the easyCLIP cutoff of 0.1% with statistical enrichment cutoffs in tandem mass spectrometry experiments used to determine RBPs, but RBPs identified via such methods may have cross-link rates in the <1–10% range5, consistent with similar effective cutoffs. easyCLIP suggests that there is some overlap between RBPs and non-RBPs in cross-link rate, which poses challenges for experiments unpaired with sequencing. Using both sequencing and cross-link rates may therefore be more useful than using either alone.

easyCLIP data suggest that a UV cross-link rate of >0.1% may be a feature of RBPs that is uncommon for non-RBPs. In some non-RBP cases, a rate may be near 0.1% due to technical artifact; however, sequencing the CLIP libraries will reveal these cases. Random non-RBPs have distinct binding profiles in RNA and are not easily distinguished from RBPs based on sequencing alone, although certain features, such as a 3:1 exon:intron ratio, are characteristic of non-RBPs. easyCLIP enables the definition of nonrandom interactions with RNA by reference to random non-RBPs and doing so better matches the known biology of RBPs.

In three cases, recurrent cancer-associated missense mutations in RBPs increased binding to RNA: KHDRBS2, A1CF, and PCBP1. Although these mutations are not individually highly frequent, our data suggest that they, and others in the long tail of cancer mutations, have molecular phenotypes. PCBP1 easyCLIP results are consistent with a model where the L100P/Q mutations impair the stabilizing effect of KH2 and have a gain of function for KH2 with regards to location and RNA binding. PCBP1 protein is often downregulated in cancer, which aids in tumorigenesis39. Data here suggest that L100P/Q mutations may contribute to tumorigenesis at least partly by destabilizing PCBP1. However, PCBP1L100P/Q is primarily observed at high frequency in bowel adenocarcinoma, raising the question as to the mechanism of its potential cell-type-enriched mutagenesis and possible functional impacts. PCBP1 has been proposed to suppress tumors by binding mRNA and stabilizing tumor suppressor mRNAs, repressing translation of oncogenic mRNAs, and inhibiting oncogenic splicing39. In this regard, cell–cell adhesion genes and cell cycle genes appear especially affected by PCBP1L100Q. To our knowledge, data presented here are the first demonstration that a disease-associated mutation in an RBP results in increased RNA association. easyCLIP helped identify these alterations and may be helpful in applying CLIP to quantify RNA cross-link rates per molecule of protein in additional future studies, including those examining RNA–protein dynamics in response to an array of stimuli.

## Methods

### Cultures

Escherichia coli BL21 cultures used to express recombinant protein were grown in standard LB broth at 37 °C with appropriate antibiotics. HEK293T (Takara Bio) and A375 (ATCC) cells were grown in Dulbecco’s modified Eagle’s medium (DMEM) media (Thermo Fisher, #11995-065) with 10% fetal bovine serum (FBS) and 1% penicillin–streptomycin at 37 °C under 5% CO2. HCT116 cells (ATCC) were grown in McCoy’s 5A (modified) medium (Thermo Fisher, #16600108), supplemented with 10% FBS and 1% penicillin–streptomycin, and grown at 37 °C under 5% CO2. HepG2 cells (ATCC) were grown in Eagle’s minimum essential medium (ATCC 30-2003), supplemented with 10% FBS and 1% penicillin–streptomycin, and grown at 37 °C under 5% CO2.

IRDye 680RD DBCO (0.5 mg) (LI-COR, 429 nmol) was resuspended in 42.9 µL phosphate-buffered saline (PBS) for a concentration of 10 mM. The L5 linkers (Azide-DNA-RNA oligonucleotides) were ordered from IDT (Integrated DNA Technologies) and resuspended in PBS. Oligonucleotides were run through a Zymo RNA-clean-and-concentrator kit (purification was required for labelling), using ~7 µg oligonucleotide per column, and eluting at ~0.5 mg/mL (~40 µM) in water. Before binding to columns, we added ethanol to a final concentration of 67% instead of the 50% recommended by the manufacturer. During column purifications, washes were performed using an 85% ethanol in water solution made fresh each time, in place of the kit’s wash buffer. Five microliters of 10 mM dye (~50 nmol) was added to 10–150 µg purified oligonucleotide (~1–12 nmol) in PBS for a total volume of 200 µL and reacted for 2 h at 37 °C. Oligonucleotides were then run again through a Zymo clean-up kit and eluted in water. Concentrations were determined by A260 ratio using an approximate ε = 368,050 M−1. Oligonucleotides were diluted to 10 nM in ligation buffer (50 mM Tris pH 7.5, 10 mM MgCl2, 16.7% PEG400), 1 µL was blotted onto a nylon membrane, and fluorescence was measured in an Odyssey CLx machine (LI-COR). This was typically ~15,000 fluorescence units/fmol for full labelling.

### Western blot protein quantification

Following the method of Janes40, purified gluthathione S-transferase (GST)-tagged protein standards were run alongside the samples to be quantified. Purified GST-hnRNPC2 and purchased FBL (Prospec, cat. enz-566) were diluted in protein dilution buffer (0.5× PBS, 0–5% glycerol, 0.05% Tween-20, 0.15 mg/mL BSA) to 200 ng/µL. Twofold dilutions down from 20 to 100 ng/µL were made for a total of eight concentrations; this solution was then delivered as 14 µL aliquots to multiple striptube aliquots and frozen at −80 °C. When running gels, 10 µL from each concentration was combined with 10 µL loading buffer (3.6× NuPAGE loading buffer with 10% β-mercaptoethanol), heated at 75 °C for 15 min, and loaded on a 4–12% NuPAGE gel. Standards were therefore present at ~3–2000 ng per lane. Immunoblotting against the HA epitope was performed with 1:3000 αHA conjugated to Alexa Fluor 488 and incubating for 1 h at room temperature in PBS blocking buffer (LI-COR); images were taken in a GE Typhoon scanner (532 nm laser, 526SP filter, 500 PMT, 200 µm resolution). When small aliquots of immunopurification beads were loaded on a gel, BSA was first added to 0.2 mg/mL to prevent absorption. Primary antibodies were diluted 1:1000 and secondary antibodies diluted 1:3000 for immunoblots, including nonquantitative immunoblots, unless otherwise noted.

### Creation of cross-linked hnRNP C standard

Four replicates of 906–1600 µg of HCT116 lysate from cross-linked cells were added to ~20 µL Protein G Dynabeads (Thermo Fisher, cat. #10003D) coupled with 25 µL (5 µg) anti-hnRNP C (4F4) antibody per replicate. Immunoprecipitation was carried out at 4 °C for ~1 h, followed by the standard easyCLIP protocol for cross-link rate determination. The RNAse digestion was performed with half of the samples treated with 0.1 U/µL RNAse ONE for 10 min, and the other half of the samples were treated with 0.05 U/µL RNAse ONE for ~5 min. The polynucleotide kinase (PNK) reaction was 14 min at 37 °C. The ligation was performed overnight (17 h) with 20 pmol L5 (barcode 23) and 2 µL high concentration of T4 RNA ligase (NEB). Samples were combined, and ~20 aliquots comprising 2.5% of the beads (~10 ng hnRNP C each, ~400 ng total purified) in ~15 µL 1.6× NuPAGE buffer were frozen in dry ice and kept long term at −80 °C. Immunoblotting was performed with ~1:3000 αhnRNP C conjugated to AF790 (Santa Cruz Biotechnology, sc-32308 AF790), which is visible on the 800 nm channel in a LI-COR Odyssey scanner, in PBS blocking buffer (LI-COR) for ~1 h at room temperature.

### Sequencing library creation: hnRNP C and FBL

HEK293T cells were grown to 30–90% confluency in petri dishes in DMEM with 10% FBS, media were removed by vacuum, cells were washed with 4 °C PBS, and UV cross-linked (254 nm) in 10 or 15 cm plates in a Stratalinker at 0.3 J/cm2. After cross-linking, 1 mL 4 °C lysis buffer (15 cm plates) or 0.5 mL lysis buffer (10 cm plates) was added to each plate, cells were harvested with a rubber spatula, and frozen in dry ice. CLIP lysis buffer was as in Zarnegar et al.23, except the concentration of Triton X-100 was 1% (see Supplementary Data 1 for all buffers used for CLIP). For each hnRNP C replicate, 4 µg hnRNP C1/C2 antibody (4F4, Santa Cruz Biochnology #sc-32308) and 20 µL Dynabeads Protein G for immunoprecipitation (Thermo Fisher, #10003D) were coupled for 1 h at room temperature before adding 600 µg of clarified HEK293T lysate and immunopurifying at 4 °C for 45–60 min. For FBL, two replicates of 4 mg clarified lysate were combined with 20 µL Fibrillarin antibody (Bethyl, #A303-891A) and 20 µL Protein G Dynabeads; immunopurification was at 4 °C for 1 h. The easyCLIP assay was performed as described in Supplementary Data 1.

### Cloning

Cloning primers are included in Supplementary Data 1 (“cloning primers” tab). Stitching reactions were performed with NEBuilder HiFi DNA Assembly (NEB, cat. # E262L) into pLEX-based vectors.

### Figure 2 cell culture

hnRNP C and FBL were purified using antibodies to the endogenous protein (Santa Cruz Biotechnology sc-32308, Bethyl A303-891A), using a >50% confluent 10 cm or 15 cm plate with HEK293T cells per replicate. RBFOX2 was purified using antibodies to the endogenous protein (Bethyl, A300-864A) and 20 million HEK293T cells. PCBP1 was stably integrated outside its genomic locus in HCT116 cells (harvesting one ~70% confluent 15 cm plate per replicate). The others were transiently transfected into HEK293T cells growing in 10 cm plates (~2 × 107 cells) with a pLEX vector bearing a uORF to lower expression. Expression levels were low compared to endogenous protein (Supplementary Fig. 7).

### Figure 5 cell culture

Proteins included in Fig. 2 were expressed and purified in the same way. PCBP1 was stably integrated outside its genomic locus in HCT116 cells (harvesting one ~70% confluent 15 cm plate per replicate). The recurrently mutated proteins were transiently transfected into HEK293T cells growing in 10 cm plates (~2 × 107 cells) or 15 cm plates (~3 × 107 cells) with either a pLEX vector bearing a uORF to lower expression or a vector modified from pLEX to remove vector sequences to boost copy number, increase expression, or form viral particles. HA-STAU1 and randomly selected non-RBPs were expressed from a pLEX vector. Expression was 18–28 h before harvesting. WT and mutant proteins were always expressed, harvested, and processed together. Expression levels were low compared to endogenous protein (Supplementary Fig. 7).

### Ligation efficiency test by protein shift

The ligation efficiency test with hnRNP C was performed in three replicates. hnRNP C was purified by incubating 600 µg of clarified HEK293T lysate with 4 µg anti-hnRNP C1/C2 antibody for 1.5 h at 4 °C23. Beads were RNAse digested and dephosphorylated, before being split 2:1. The split corresponding to 200 µg lysate was PNK phosphorylated and 5′ ligated as described in the easyCLIP protocol. The split corresponding to 400 µg was 3′ ligated, before being split in half. One 3′ ligated split was PNK phosphorylated and 5′ ligated as described in the easyCLIP protocol. All samples were then run on a 4–12% SDS-PAGE gel (NuPAGE), transferred to nitrocellulose, and visualized. The amount of RNA that was neither 5′ nor 3′ ligated was determined by the following reasoning. First, let P5 be the probability of a 5′ ligation, and P3 be the probability of a 3′ ligation. Let a = RNA with no ligation; b = RNA with a 3′ ligation only; c = RNA with a 5′ ligation only; and d = RNA with a 5′ and 3′ ligation. Let T = the total amount of RNA. It follows that:

$$b \times c = (T \times P3(1 - P5))\times(T \times P5(1 - P3))$$
(1)
$$a \times d = (T \times(1 - P5)(1 - P3))\times(T \times P5 \times P3)$$
(2)

Rearranging terms shows that a × d = b × c. Since d, b, and c are determined by direct visualization of fluorescence, it follows that the RNA with no ligation (a) is also known.

### CLIP analysis: peak location finding

Scripts used for CLIP analysis are available at github.com/dfporter/easyCLIP. For the peak locations used for motif finding in Fig. 2f, peaks were defined by averaging signal across the genomic locus. For each RNA, reads spanning the genomic locus were converted into an array with the length of the genomic locus and each value representing the count of 5′ read ends mapping to that position. The values were smoothed by convolution using a box with length 50 for loci of at least 2000 nucleotides, length 20 for 20–2000 nucleotides, and length 10 for <200 nucleotides. Artifacts were removed by discarding an RNA if there existed a 2-nucleotide interval in the 100 nucleotides centered around (and including) the peak that contains >80% of the total signal in that 100 nucleotide window. If reads mapped to multiple RNAs, but only one was an exon, reads were assigned to the exon. If reads overlapped with the exons of multiple RNAs, they were assigned a priority in the order rRNA, snRNA, small Cajal body-associated RNA, snoRNA, tRNA, and mRNA. If this priority list did not result in a single RNA having priority, the reads were considered ambiguous and not used for peak finding. If the smoothed array had a single maximum, it was taken to be the peak location. If there were multiple maxima (equal heights) and no maxima had more than a two-nucleotide gap from another maxima, the peak was taken as the average position between the first and last maxima. If any maximum was more than two nucleotides from another maximum, the RNA was considered to have no peak.

### CLIP analysis: statistics

Inputs to statistical analysis were either reads per million or reads per ten billion proteins, both treated the same. To speed up analysis, for the randomly selected non-RBPs constituting background, if a replicate had no reads it was assigned one-tenth the minimum positive count present in that dataset (i.e., if a dataset had one million reads, zeros were replaced with 0.1 reads per million). The average count across replicates for each protein was determined, resulting in a sample of eight values taken from the null distribution (one for each of the proteins CDK4, CHMP3, DCTN6, ETS2, IDE, ITPA, TPGS2, and UBA2). If σ2/µ was >2 for these samples, they were fit to a negative binomial, and they were fit to a Poisson if σ2/µ was <2. P values were calculated accordingly before finally adjusting all P values for each protein by the Benjamini–Hochberg method into FDR equivalents.

Additional methods are described in Supplementary Data 1 and in Supplementary Methods.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

Plasmids will be provided upon request. High-throughput sequencing data are given under the accessions “GSE154168”, “GSE162366”, and “GSE131210”. Source data are provided with this paper.

## Code availability

Software is available under github.com/dfporter/easyCLIP.

## References

1. 1.

Gerstberger, S., Hafner, M. & Tuschl, T. A census of human RNA-binding proteins. Nat. Rev. Genet. 15, 829–845 (2014).

2. 2.

Castello, A. et al. Comprehensive identification of RNA-binding domains in human cells. Mol. Cell 63, 696–710 (2016).

3. 3.

Baltz, A. G. et al. The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts. Mol. Cell 46, 674–690 (2012).

4. 4.

Queiroz, R. M. L. et al. Comprehensive identification of RNA-protein interactions in any organism using orthogonal organic phase separation (OOPS). Nat. Biotechnol. 37, 169–178 (2019).

5. 5.

Urdaneta, E. C. et al. Purification of cross-linked RNA-protein complexes by phenol-toluol extraction. Nat. Commun. 10, 990 (2019).

6. 6.

Kwon, S. C. et al. The RNA-binding protein repertoire of embryonic stem cells. Nat. Struct. Mol. Biol. 20, 1122–1130 (2013).

7. 7.

Trendel, J. et al. The human RNA-binding proteome and its dynamics during translational arrest. Cell 176, 391–403.e19 (2019).

8. 8.

Horos, R. et al. The small non-coding vault RNA1-1 acts as a riboregulator of autophagy. Cell 176, 1054–1067.e12 (2019).

9. 9.

Hentze, M. W., Castello, A., Schwarzl, T. & Preiss, T. A brave new world of RNA-binding proteins. Nat. Rev. Mol. Cell Biol. 19, 327–341 (2018).

10. 10.

Ramanathan, M., Porter, D. F. & Khavari, P. A. Methods to study RNA–protein interactions. Nat. Methods 16, 225–234 (2019).

11. 11.

Dvinge, H. & Bradley, R. K. Widespread intron retention diversifies most cancer transcriptomes. Genome Med. 7, 45 (2015).

12. 12.

Kahles, A. et al. Comprehensive analysis of alternative splicing across tumors from 8,705 patients. Cancer Cell 34, 211–224.e6 (2018).

13. 13.

Sebestyén, E. et al. Large-scale analysis of genome and transcriptome alterations in multiple tumors unveils novel cancer-relevant splicing networks. Genome Res. 26, 732–744 (2016).

14. 14.

Tokheim, C. J., Papadopoulos, N., Kinzler, K. W., Vogelstein, B. & Karchin, R. Evaluating the evaluation of cancer driver genes. Proc. Natl Acad. Sci. USA 113, 14330–14335 (2016).

15. 15.

Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).

16. 16.

The Cancer Genome Atlas Research Network. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113 (2013).

17. 17.

Granneman, S., Kudla, G., Petfalski, E. & Tollervey, D. Identification of protein binding sites on U3 snoRNA and pre-rRNA by UV cross-linking and high-throughput analysis of cDNAs. Proc. Natl Acad. Sci. USA 106, 9613 (2009).

18. 18.

Porter, D. F., Koh, Y. Y., VanVeller, B., Raines, R. T. & Wickens, M. Target selection by natural and redesigned PUF proteins. Proc. Natl Acad. Sci. USA 112, 15868–15873 (2015).

19. 19.

Benhalevy, D., McFarland, H. L., Sarshad, A. A. & Hafner, M. PAR-CLIP and streamlined small RNA cDNA library preparation protocol for the identification of RNA binding protein target sites. Protein-RNA Struct. Funct. Recognit. 118–119, 41–49 (2017).

20. 20.

Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

21. 21.

Zhang, C. & Darnell, R. B. Mapping in vivo protein-RNA interactions at single-nucleotide resolution from. Nat. Biotechnol. 29, 607–614 (2011).

22. 22.

Kishore, S. et al. Insights into snoRNA biogenesis and processing from PAR-CLIP of snoRNA core proteins and small RNA sequencing. Genome Biol. 14, R45–R45 (2013).

23. 23.

Zarnegar, B. J. et al. irCLIP platform for efficient characterization of protein–RNA interactions. Nat. Methods 13, 489–492 (2016).

24. 24.

Kim, B. & Kim, V. N. fCLIP-seq for transcriptomic footprinting of dsRNA-binding proteins: Lessons from DROSHA. Methods https://doi.org/10.1016/j.ymeth.2018.06.004 (2018).

25. 25.

Cieniková, Z., Damberger, F. F., Hall, J., Allain, F. H.-T. & Maris, C. Structural and mechanistic insights into poly(uridine) tract recognition by the hnRNP C RNA recognition motif. J. Am. Chem. Soc. 136, 14536–14544 (2014).

26. 26.

Beckmann, B. M. RNA interactome capture in yeast. Methods 118–119, 82–92 (2017).

27. 27.

Beckmann, B. M. et al. The RNA-binding proteomes from yeast to man harbour conserved enigmRBPs. Nat. Commun. 6, 10127 (2015).

28. 28.

Strein, C., Alleaume, A.-M., Rothbauer, U., Hentze, M. W. & Castello, A. A versatile assay for RNA-binding proteins living cells. RNA 20, 721–731 (2014).

29. 29.

van Dijk, T. et al. RARS2 mutations: is pontocerebellar hypoplasia type 6 a mitochondrial encephalopathy? JIMD Rep. 33, 87–92 (2017).

30. 30.

Li, X. L. et al. Long noncoding RNA PURPL suppresses basal p53 levels and promotes tumorigenicity in colorectal cancer. Cell Rep. 20, 2408–2423 (2017).

31. 31.

Liu, Y. et al. Comparative molecular analysis of gastrointestinal adenocarcinomas. Cancer Cell 33, 721–735.e8 (2018).

32. 32.

Meng, Q. et al. Signaling-dependent and coordinated regulation of transcription, splicing, and translation resides in a single coregulator, PCBP1. Proc. Natl Acad. Sci. USA 104, 5866–5871 (2007).

33. 33.

Makeyev, A. V. & Liebhaber, S. A. The poly(C)-binding proteins: a multiplicity of functions and a search for mechanisms. RNA 8, 265–278 (2002).

34. 34.

Hollingworth, D. et al. KH domains with impaired nucleic acid binding as a tool for functional analysis. Nucleic Acids Res. 40, 6873–6886 (2012).

35. 35.

Du, Z., Fenn, S., Tjhen, R. & James, T. L. Structure of a construct of a human poly(C)-binding protein containing the first and second KH domains reveals insights into its regulatory mechanisms. J. Biol. Chem. 283, 28757–28766 (2008).

36. 36.

Waggoner, S. A., Johannes, G. J. & Liebhaber, S. A. Depletion of the poly(C)-binding proteins alphaCP1 and alphaCP2 from K562 cells leads to p53-independent induction of cyclin-dependent kinase inhibitor (CDKN1A) and G1 arrest. J. Biol. Chem. 284, 9039–9049 (2009).

37. 37.

Sysoev, V. O. et al. Global changes of the RNA-bound proteome during the maternal-to-zygotic transition in Drosophila. Nat. Commun. 7, 12128 (2016).

38. 38.

Despic, V. et al. Dynamic RNA-protein interactions underlie the zebrafish maternal-to-zygotic transition. Genome Res. 27, 1184–1194 (2017).

39. 39.

Guo, J. & Jia, R. Splicing factor poly(rC)-binding protein 1 is a novel and distinctive tumor suppressor. J. Cell. Physiol. 234, 33–41 (2019).

40. 40.

Janes, K. A. An analysis of critical factors for quantitative immunoblotting. Sci. Signal. 8, rs2 LP–rs2 L2 (2015).

41. 41.

Porter, D. F. & Khavari, P. A. easyCLIP analysis of RNA-protein interactions by sequencing cross-linked RNAs or absolute quantification of cross-link rates. Protoc. Exch. https://doi.org/10.21203/rs.3.pex-1333/v1 (2021).

## Acknowledgements

We thank Brian Zarnegar for reagents, and input on experiments and their interpretation. We also thank Amin Zia, Zurab Siprashvili, and Yuning Wei for assistance. SF3B1 was obtained from a vector produced by Angelos Constantinou, provided by Marc-Henri Stern. Funding was provided by a USVA Merit Review grant BX001409 to P.A.K. and by NIAMS/NIH grants AR49737 and AR45192 to P.A.K., and 1F32AR072504 to D.F.P.; A.L.J. is a recipient of a Physician-Scientist Training Award from the Damon Runyon Cancer Research Foundation. Some data were generated on an Illumina HiSeq purchased with funds from award S10OD018220 by SFGF at Stanford. High-throughput sequencing data are given under the accessions GSE154168, GSE162366, and GSE131210.

## Author information

Authors

### Contributions

D.F.P. performed and analyzed CLIP-seq. D.F.P., W.M., and X.Y. performed and analyzed experiments. G.A.G., M.A., and D.D. performed and/or analyzed in vitro binding. A.L.J. and L.K.H.D. analyzed RNA-seq data. P.A.K. and D.F.P. planned experiments, analyzed results, and wrote the paper with input from all authors.

### Corresponding author

Correspondence to Paul A. Khavari.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Peer review information Nature Communications thanks Jernej Ule and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Porter, D.F., Miao, W., Yang, X. et al. easyCLIP analysis of RNA-protein interactions incorporating absolute quantification. Nat Commun 12, 1569 (2021). https://doi.org/10.1038/s41467-021-21623-4

• Accepted:

• Published: