Main

Cancer cells often co-opt post-transcriptional regulatory networks to activate pro-metastatic gene expression programmes1,2,3. Therefore, all stages of the messenger RNA life cycle—including alternative splicing, post-transcriptional modification, translation and decay—have been implicated in cancer progression2,3,4. Translational control has been increasingly recognized as an important regulatory node in tumourigenesis5; however, our understanding on how translational deregulation acts in the later stages of cancer remains incomplete.

To activate one of the main routes to metastasis, cancer cells have been shown to exploit the translational upregulation of several factors involved in the epithelial-to-mesenchymal transition6,7. Furthermore, numerous studies have observed a global tendency towards 3′ untranslated region (UTR) shortening in cancer8,9,10,11,12, suggesting consequences from reduced interactions with RNA-binding proteins (RBPs) and microRNAs (miRNAs)13, including altered translation5. In some cases, these observations could be attributed to changes in the expression of specific mRNA cleavage and polyadenylation factors10,11, although in many instances the underlying molecular mechanisms remain unknown. Similarly, we have previously demonstrated that the translational reprogramming that accompanies changes in transfer RNA expression landscape drives metastasis in breast cancer14. Importantly, a systematic characterization of translational control and its links to other aspects of RNA metabolism in metastasis is still lacking.

In this Article, we applied genome-wide experimental and computational approaches to address the changes in mRNA translation that accompany metastatic progression in breast cancer. We performed ribosome profiling in both cell line- and patient-derived models of breast cancer metastasis, and used Ribolog, a novel analytical framework, to identify the underlying regulatory programmes that govern changes in the translational control landscape. By applying these tools, we identified a functional interplay between nuclear RNA processing and translational control that disrupts the expression of a metastasis-suppressive regulon.

Results

Translational reprogramming accompanies breast cancer metastasis

To capture changes in the translational landscape that are associated with breast cancer metastasis, we performed ribosome profiling (Ribo-seq) on a commonly used triple receptor negative model of breast cancer metastasis, MDA-MB-231 breast cancer cells and their lung metastatic derivative cell line, MDA-LM2 (ref. 15). We predominantly recovered 33–34-nucleotide-long ribosome-protected mRNA footprints, aligning in frame with annotated coding sequences (CDSs)16, confirming the high quality of our dataset (Extended Data Fig. 1a,b). We then sought to measure relative changes in translational activity genome-wide by calculating translational efficiency ratios (TERs) between MDA-LM2 and parental MDA-MB-231 cells.

To perform reliable differential analysis of Ribo-seq data and systematically account for possible confounders, we developed an analytical framework for comparison of translation efficiencies (TEs, representing the ratio between ribosome-protected mRNA footprint (RPF) and mRNA abundance sequencing read counts), aiming for as few a priori assumptions as possible. The resulting method, which we have named Ribolog, relies on logistic regression to model individual Ribo-seq and RNA sequencing (RNA-seq) reads. Ribolog calculates the log odds of RPF:RNA reads and its dependence on experimental covariates in order to estimate logTER (that is, log fold change in TE) and its associated raw and corrected P value in the coding transcriptome (for the details and advantages of this approach, see Methods).

First, we used Ribolog to calculate the TE changes between poorly and highly metastatic breast cancer cells, and detected numerous differentially translated mRNAs (Fig. 1a). We then assessed the impact of changes in TE on the proteome by comparing the abundance of proteins in MDA-LM2 and parental MDA-MB-231 cells using tandem mass tag labelling and mass spectrometry (TMT–MS). As expected, we observed broad changes in the proteome as cells become more metastatic (Extended Data Fig. 1c). Moreover, we observed that the changes in protein levels can be partially but not completely explained by changes in the mRNA levels (R = 0.46, P < 1 × 10–193 between mRNA and protein log fold changes), which points to regulators of protein synthesis and decay as another source of variation. Consistently, we observed that changes in protein levels showed a stronger correlation with changes in ribosome density than with differences in mRNA levels (R = 0.53, P < 1 × 10–270). To formalize differential TE as a key factor in the observed modulations, we corrected changes in the protein levels by their respective changes in mRNA abundance, and observed that the resulting measures were significantly correlated with logTER values from Ribo-seq (R = 0.3, P < 2 × 10–16; Extended Data Fig. 1d). These findings suggest that post-transcriptional regulation of TE has a significant impact on protein levels in highly metastatic cells.

Fig. 1: HNRNPC target mRNAs are translationally repressed in highly metastatic breast cancer cells and PDXs.
figure 1

a, Bottom: volcano plot showing the distribution of changes in TER (logTER) in MDA-LM2 compared with parental MDA-MB-231 cells. Statistically significant (logistic regression, P < 0.01) observations are highlighted in orange. Top: enrichment of the poly(U) motif in the mRNA 3′ UTRs as a function of logTER between MDA-LM2 and MDA-MB-231 cells. mRNAs are divided into equally populated bins based on their logTER (dashed vertical lines delineate the bins). Bins with significant enrichment (hypergeometric test, corrected P < 0.05; red) or depletion (blue) of poly(U) motifs are denoted with a bolded border. Also included are mutual information (MI) values and their associated z-scores. b, Volcano plot showing the distribution of changes in TE in highly versus poorly metastatic breast cancer PDXs, as described for a. c, Heat map showing the enrichment of poly(U) motifs among the HNRNPC binding sites (as determined by CLIP-seq) as compared with scrambled sequences (with di-nucleotide frequency held constant). The bolded border denotes a statistically significant enrichment (hypergeometric test, corrected P < 0.05; red). MI value and associated z-score are shown. d, Enrichment of the HNRNPC target mRNAs as a function of logTER between MDA-LM2 and MDA-MB-231 cells. mRNAs are binned as in a; the y axis shows the frequency of the HNRNPC targets (3′ UTR-bound) that we identified in each bin (dashed horizontal line denotes the average HNRNPC target frequency across all transcripts). Bins with significant enrichment (logistic regression, FDR <0.05; red) or depletion (blue) of HNRNPC targets are denoted with a black border. e, Enrichment patterns of HNRNPC target mRNAs as a function of logTER between highly and poorly metastatic breast cancer PDXs, as in d.

Given the extent of translational reprogramming observed in MDA-LM2 cells relative to their poorly metastatic parental line, we sought to systematically identify cis-regulatory elements in RNA that are significantly associated with the observed changes in TE. For this analysis, we used FIRE17 algorithm to search for RNA motifs that are enriched in the 3′ UTRs of mRNAs with differential TE. FIRE identified poly(U) sequence motifs that were enriched in translationally repressed mRNAs in MDA-LM2 compared with parental cells (Fig. 1a). To extend our findings to other clinically relevant models of breast cancer metastasis, we also performed Ribo-seq on two sets of poorly and highly metastatic human breast cancer patient-derived xenografts (PDXs)18,19,20 (Extended Data Fig. 1e,f). We then compared TEs in the two highly metastatic PDXs (HCI-001 and HCI-010) with those in the two poorly metastatic PDXs (HCI-002 and STG139). We observed broad differences in the translational landscape of these PDXs, and similar to the results from the breast cancer cell lines, we observed significantly reduced translation of mRNAs with poly(U) motifs in their 3′ UTRs (Fig. 1b).

HNRNPC controls the translation of its 3′ UTR-bound regulon

Poly(U) motifs are recognized by many RBPs, and therefore function in a context-dependent manner21. To identify the most likely trans-factors interacting with the poly(U) sequences in translationally repressed mRNAs, we used information from the sequence context in which the poly(U) motifs are embedded. For this analysis we used DeepBind22 algorithm and identified heterogeneous nuclear ribonucleoprotein C (HNRNPC) as the candidate most likely to bind the poly(U) motifs of interest (Extended Data Fig. 1g). In agreement with a potential role for HNRNPC involvement in a translational deregulation programme in metastatic breast cancer, HNRNPC was modestly but significantly downregulated in highly metastatic cells, both at the mRNA (log fold change −0.5, P = 0.05, determined by RNA-seq) and protein level (log fold change of −0.24, P = 0.04, determined by mass spectrometry (MS)14). HNRNPC ranks in the top 10% of proteins in MDA-MB-231 cells that can be detected by MS23, and therefore a slight relative decrease in protein levels corresponds to a large decrease in absolute HNRNPC abundance.

To explore the possibility that HNRNPC is a trans-factor that binds the identified translational regulatory poly(U) elements, we performed HNRNPC cross-linking and immunoprecipitation coupled to sequencing (CLIP-seq)24 in MDA-MB-231 cells. As controls, we also performed CLIP-seq for two other poly(U) binding proteins, TIA1 and ELAVL1, that ranked high in the DeepBind analysis (Extended Data Fig. 1g). In agreement with the existing data25 and DeepBind predictions, poly(U) motifs were significantly enriched within HNRNPC-bound sequences (Fig. 1c). Furthermore, we detected a substantial amount of HNRNPC binding to poly(U) elements in 3′ UTRs across our own as well as previously published HNRNPC CLIP-seq datasets25 (Extended Data Fig. 1h). Finally, if the poly(U) motifs in the 3′ UTRs of translationally repressed mRNAs are bound by HNRNPC, mRNAs that are bound by HNRNPC in their 3′ UTRs should also be translationally repressed in highly metastatic cells. To test this, we performed a gene-set enrichment analysis, using the set of HNRNPC-bound 3′ UTRs to assess their patterns of enrichment and depletion across the TE values from both breast cancer cell lines and PDXs. As shown in Fig. 1d,e (and Extended Data Fig. 1i,j), we observed a consistent enrichment of this HNRNPC regulon among the genes with lower TE in the highly metastatic cells. Importantly, HNRNPC regulon was substantially more translationally repressed than TIA1 or ELAVL1 targets (Extended Data Fig. 1k).

To confirm the causal role of HNRNPC in controlling the translation of this regulon, we used CRISPR interference26 (CRISPRi) to knock down HNRNPC in MDA-MB-231 cells (fold change of 0.34, P < 0.01, determined by quantitative reverse transcription polymerase chain reaction (RT–qPCR)), and used Ribo-seq to compare TEs in control and HNRNPC-deficient cells. HNRNPC knockdown (KD) affected the translational landscape in MDA-MB-231 cells and, specifically, caused translational repression of HNRNPC target mRNAs (Fig. 2a and Extended Data Fig. 2a). We found that for the most part the same HNRNPC regulon mRNAs were translationally repressed in MDA-LM2 and HNRNPC KD cells, further highlighting the role of HNRNPC as a regulator of TE (Extended Data Fig. 2b). We individually validated several targets translationally repressed in highly metastatic and HNRNPC KD cells (Extended Data Fig. 2c,d). However, HNRNPC is a predominantly nuclear protein and a known regulator of alternative splicing27, a fact that is reflected in our CLIP-seq data as well on the basis of its pervasive binding to intronic sequences (Extended Data Fig. 1h). Therefore, it was unclear how HNRNPC, as a nuclear protein, could impact the translation of its targets in the cytoplasm. We thus hypothesized that HNRNPC might act indirectly to control the translation of its targets.

Fig. 2: HNRNPC binding impacts the translation and APA of its targets.
figure 2

a, Bottom: volcano plot showing the distribution of changes in TER (logTER) in sgHNRNPC compared with sgControl MDA-MB-231 cells. Statistically significant (logistic regression, P < 0.01) observations are highlighted in orange. Top: enrichment of the HNRNPC targets as a function of logTER between sgHNRNPC and sgControl cells. mRNAs are divided into equally populated bins according to logTER (dashed vertical lines delineate the bins); the y axis shows the frequency of the HNRNPC targets that we identified in each bin (dashed horizontal line denotes the average HNRNPC target frequency across all transcripts). Bins with significant enrichment (logistic regression, FDR <0.05; red) or depletion (blue) of HNRNPC targets are denoted with a black border. Also included are mutual information (MI) values and their associated z-scores. b, Heat maps showing the enrichment of canonical poly(A) signals in the vicinity (500 nt flanking) of HNRNPC binding peaks in 3′ UTRs (as determined by CLIP-seq). The bolded border denotes a statistically significant enrichment (hypergeometric test, corrected P < 0.05; red). MI values and associated z-scores are shown. c, Venn diagram showing the overlap between HNRNPC 3′ UTR target mRNAs and mRNAs showing APA. P value calculated using hypergeometric test. d, Bottom: volcano plot showing distribution of changes in APA ratio (logAPAR; for detailed description, see Methods) in MDA-LM2 compared with MDA-MB-231 cells. Top: enrichment of the HNRNPC-bound 3′ UTRs as a function of APAR between MDA-LM2 and parental MDA-MB-231 cells; statistics as in a. e, Bottom: volcano plot showing distribution of changes in APAR in sgHNRNPC compared with sgControl cells. Top: enrichment of the HNRNPC-bound 3′ UTRs as a function of APAR between sgHNRNPC and sgControl cells; statistics as in a.

HNRNPC controls the APA of its targets

To better capture the regulatory context within which HNRNPC functions, we performed a systematic search for additional cis-regulatory elements in the vicinity of HNRNPC binding sites on 3′ UTRs. Interestingly, as shown in Fig. 2b, we observed a highly significant enrichment of canonical poly(A) signals (AAUAAA and AUUAAA)13 within the 500 nucleotide flanking regions of HNRNPC CLIP-seq peaks in 3′ UTRs. This observation led us to the hypothesis that HNRNPC controls translation by regulating 3′ UTR length via alternative polyadenylation (APA) site selection. Consistently, the majority of HNRNPC 3′ UTR targets carry annotated APA sites, significantly more than expected by chance (Fig. 2c). In line with this, HNRNPC has been previously implicated in the control of APA in other studies; however, the mechanism through which HNRNPC impacts polyadenylation remained uncertain28,29. To confirm that the changes in TE we observed in highly metastatic cells coincide with alteration in poly(A) site selection, we performed mRNA 3′-end sequencing in the parental MDA-MB-231 and MDA-LM2 cells, as well as in control and HNRNPC KD cells. To measure changes in poly(A) site selection, we tabulated the number of reads that mapped to each annotated poly(A) site across the transcriptome. We then used a quantity we call logAPAR (log fold change in proximal-to-distal APA ratio) to identify poly(A) site switches between conditions.

To assess the statistical significance of the observed changes in APA (that is, non-zero logAPAR), we developed a novel method named APAlog. APAlog runs multinomial logistic regression to test differential usage of two or more poly(A) sites per transcript, and simultaneously calculates logAPAR and its associated raw and corrected P value. APAlog functions in three modes: (1) identifying transcripts with the highest overall variability in poly(A) site usage across conditions, (2) comparing all non-canonical poly(A) sites with one canonical (reference) poly(A) site per transcript and (3) comparing all pairs of poly(A) sites per transcript (for details, see Methods). Using APAlog, we found that HNRNPC target mRNAs undergo 3′ UTR lengthening (that is, proximal-to-distal poly(A) site switch) in MDA-LM2 (Fig. 2d and Extended Data Fig. 2e) and HNRNPC KD (Fig. 2e and Extended Data Fig. 2f,g) cells.

To confirm our observations in an exogenous setting, we designed a massively parallel reporter assay that simultaneously monitors reporter mRNA poly(A) site selection and translation. We used a bidirectional promoter vector, driving mCherry and blue fluorescent protein (BFP) expression, and containing two APA sites downstream of BFP. We then cloned a library of CLIP-seq-derived HNRNPC binding sites or matched scrambled controls upstream of the proximal poly(A) site in the reporter (Extended Data Fig. 2h). We assessed the reporter protein expression in MDA-MB-231 cells by flow cytometry, and the poly(A) site choice by 3′-end RNA-seq, in both total and sorted cell populations, where we separated the cells on the basis of the BFP/mCherry ratio. HNRNPC KD led to a lower BFP/mCherry signal in bulk (Extended Data Fig. 2i) and a preferential choice of the distal poly(A) site in reporter mRNAs bearing HNRNPC binding sites when compared with matched shuffled controls (Extended Data Fig. 2j). Furthermore, lower reporter protein expression was significantly associated with longer reporter 3′ UTRs with HNRNPC binding sites as compared with shuffled controls. Importantly, this observation was dependent on HNRNPC (Extended Data Fig. 2k). In sum, our results demonstrate that HNRNPC acts as a direct mediator of an alternative poly(A) site selection programme in metastatic breast cancer with broad consequences on the translational landscape.

HNRNPC acts with PABPC4 to control APA

To obtain insights into how HNRNPC differentially controls polyadenylation of its target RNAs in parental MDA-MB-231 and MDA-LM2 cells, we immunoprecipitated HNRNPC in both cell lines and identified interacting proteins by MS. We specifically searched for ways in which the HNRNPC interactome switches between poorly and highly metastatic cells. First, in agreement with the canonical role of HNRNPC as a splicing regulator, we detected numerous splicing factors among HNRNPC interactors (Supplementary Table 1). However, when comparing the HNRNPC interactomes between poorly and highly metastatic cells, we did not observe broad changes in the interaction between HNRNPC and other splicing factors. In contrast, we found that a group of proteins implicated in mRNA transport from the nucleus was significantly depleted from the HNRNPC interactome in MDA-LM2 cells (Extended Data Fig. 3a). Upon closer inspection of the proteins in this set, we noted multiple poly(A)-binding proteins that play canonical roles in the mRNA nuclear export cascade30. We found that PABPC4 and PABPN1 were among the depleted HNRNPC interactors in MDA-LM2 cells (Fig. 3a and Extended Data Fig. 3b). We individually confirmed an RNA-dependent interaction between HNRNPC and PABPC4 or PABPN1 in MDA-MB-231 cells by co-immunoprecipitation (co-IP) and western blotting (Fig. 3b). Next, to assess which, if any, of these factors acts in concert with HNRNPC to regulate poly(A) site selection, we depleted PABPC4 and PABPN1 in MDA-MB-231 cells using CRISPRi (fold change of 0.21 and 0.69, respectively, P < 0.05, as determined by RT–qPCR) and employed 3′-end RNA-seq to compare the APA landscapes in control and KD cells. We found that PABPC4 KD, but not PABPN1 KD, resulted in APA changes similar to those observed in HNRNPC-deficient cells (Extended Data Fig. 3c,d). Importantly, PABPC4 was also downregulated in MDA-LM2 cells compared with parental MDA-MB-231 cells at both the mRNA (log fold change −0.9, P = 0.02, determined by RNA-seq) and protein level (log fold change −0.5, P < 0.002, determined by MS14).

Fig. 3: PABPC4 acts in concert with HNRNPC to control APA and directs its target mRNAs to AGO2-dependent translational repression.
figure 3

a, Volcano plot showing the distribution of changes in relative protein interaction with HNRNPC (as determined by HNRNPC or control (isotype IgG) co-IP–MS) in MDA-LM2 compared with MDA-MB-231 cells. Statistically significant (FDR-adjusted P value <0.25, generalized linear model controlling for input abundances) observations are highlighted in pink. b, Co-IPs of HNRNPC or control IgG were analysed by western blotting. RNase A was included in the lysates where indicated. Representative image from two independent experiments. c, Venn diagram showing the overlap between HNRNPC and PABPC4 3′ UTR targets (as determined by CLIP-seq and PAPER-CLIP, respectively). P value calculated using hypergeometric test. d, Bottom: volcano plot showing distribution of changes in APA ratio (logAPAR) in HNRNPC/PABPC4 double KD compared with PABPC4 KD MDA-MB-231 cells. Top: enrichment of the HNRNPC-bound 3′ UTRs as a function of APAR between sgHNRNPC/sgPABPC4 and sgControl/sgPABPC4 cells; statistics as in Fig. 2a. e, Venn diagram showing the overlap between HNRNPC 3′ UTR targets and AGO2 bound mRNAs (top) or miRNA target mRNAs (bottom), as determined by CLIP-seq analyses. P values calculated using hypergeometric test. f, Bottom: volcano plot showing distribution of changes in TER (logTER) in sgAGO2 compared with sgControl MDA-MB-231 cells. Top: enrichment of the AGO2 targets as a function of logTER between sgAGO2 and sgControl cells; statistics as in Fig. 2a. g, Enrichment of the HNRNPC targets as a function of logTER between sgAGO2 and sgControl cells. mRNAs are distributed into equally populated bins according to their logTER (the red bars on the black background show the range of values in each bin). Bins with significant enrichment (hypergeometric test, corrected P < 0.05; red) or depletion (blue) of HNRNPC targets (3′ UTR-bound) are denoted with a bolded border. Also included are mutual information (MI) value and its associated z-score.

Source data

To further investigate the HNRNPC-PABPC4 regulon in cells, we performed PABPC4 PAPERCLIP31 in MDA-MB-231 cells. First, as expected, and similar to our observations for HNRNPC, we noted that PABPC4 peaks were significantly enriched in the vicinity of canonical poly(A) signals (Extended Data Fig. 3e). Moreover, we observed that the majority (96%) of HNRNPC 3′ UTR targets were also bound by PABPC4 in vivo (Fig. 3c). To confirm that HNRNPC and PABPC4 act in concert to control the APA of HNRNPC targets, we performed 3′-end RNA-seq comparing HNRNPC/PABPC4 double KD cells with PABPC4 KD alone. Unlike Fig. 2e, in this comparison, we did not observe the significant proximal-to-distal switching within the HNRNPC regulon that we had observed in HNRNPC and PABPC4 single KD cells (Fig. 3d and Extended Data Fig. 3f). This finding indicates that the regulatory function of HNRNPC in poly(A) site selection is contingent on PABPC4 expression, and demonstrates an epistatic interaction between these two genes.

As genes associated with mRNA nuclear export were depleted from the HNRNPC interactome in highly metastatic cells, we considered that HNRNPC could impact the nuclear export of its targets. To address this, we performed subcellular fractionation of MDA-MB-231 and MDA-LM2, as well as control and HNRNPC KD cells, and extracted their nuclear and cytoplasmic RNA. As expected, we recovered mature mRNA predominantly in the cytoplasmic fraction, while pre-mRNA or known nucleus-retained RNA was recovered exclusively in the nuclear fraction (Extended Data Fig. 3g). However, we did not detect any differences between the two compartments when we assessed changes in the APA using 3′-end RNA-seq and APAlog (Extended Data Fig. 3h,i). Importantly, using the data obtained with cytoplasmic RNA, we recapitulated our findings from Fig. 2d,e, showing that longer 3′ UTR-bearing HNRNPC target mRNAs are exported to the cytoplasm, in both highly metastatic and HNRNPC-deficient cells (Extended Data Fig. 3j,k). We also repeated Ribo-seq analysis from the cytoplasmic fraction described above. Here, again, we largely recapitulated our data from Figs. 1a and 2a, showing that HNRNPC targets undergo translational repression in the cytoplasm in HNRNPC low conditions (Extended Data Fig. 3l,m).

RNA interference pathway targets the HNRNPC regulon

We reasoned that the extended 3′ UTRs carry more translationally repressive cis-regulatory elements, such as miRNA binding sites. Consistent with this hypothesis, we observed a significant overlap between HNRNPC-bound extended 3′ UTRs and miRNA/argonaute (AGO2) targets32 (Fig. 3e). miRNAs are known repressors of mRNA stability and translation33. We observed, accordingly, that mRNAs both bound by HNRNPC and containing functional miRNA binding sites32 had significantly lower TE than non-target mRNAs in MDA-LM2 compared with parental MDA-MB-231 and in HNRNPC-deficient compared with control cells (Extended Data Fig. 4a,b). Furthermore, translationally repressed mRNAs in MDA-LM2 and HNRNPC-deficient cells were enriched among AGO2 targets (Fig. 3f and Extended Data Fig. 4c). To validate that miRNAs contribute to the translational repression of HNRNPC targets via an AGO2-mediated mechanism, we performed Ribo-seq in control and HNRNPC-depleted MDA-MB-231 cells, in both control and AGO2 KD backgrounds (AGO2 fold change of 0.55, P < 0.05, as determined by RT–qPCR). We found that HNRNPC-dependent translational repression was contingent on AGO2 expression (Extended Data Fig. 4d). Similarly, when we compared TEs in control and AGO2 KD cells, we found that mRNAs with higher TEs in AGO2-depleted cells were enriched in HNRNPC-bound transcripts (Fig. 3g).

HNRNPC and PABPC4 act as suppressors of metastasis in xenograft models

We next used a xenograft mouse model of metastasis to measure the impact of perturbing this HNRNPC-mediated pathway on the metastatic capacity of the cell. We performed lung colonization assays in NOD scid gamma (NSG) mice by intravenously injecting control and HNRNPC KD cells, constitutively expressing luciferase. We monitored the metastatic burden in the lungs of these mice by in vivo bioluminescence imaging, and observed an over ten-fold increase in lung colonization capacity induced by HNRNPC KD (Fig. 4a). To ensure that these findings are generalizable to other genetic backgrounds, we repeated the experiment with HCC1806 breast cancer cells and observed a consistent increase in the metastatic capacity of these cells upon HNRNPC downregulation (Extended Data Fig. 4e). Conversely, overexpressing HNRNPC in MDA-LM2 cells reduced the metastatic colonization by breast cancer cells (Fig. 4b).

Fig. 4: HNRNPC levels impact in vivo metastatic colonization of breast cancer cells.
figure 4

a, MDA-LM2 cells stably expressing sgHNRNPC or sgControl were injected via tail vein into NSG mice. Bioluminescence was measured at the indicated times (mean values are shown, error bars indicating standard error of the mean (s.e.m.); P value calculated using two-way analysis of variance (ANOVA)); area under the curve was measured at the final timepoint (P value calculated using one-tailed Mann–Whitney U test). Lung sections were stained with H&E (representative images shown). n = 4–5 mice per cohort. b, MDA-LM2 cells stably overexpressing mCherry or HNRNPC were injected via tail vein into NSG mice. Bioluminescence was measured at the indicated times (mean values are shown, error bars indicating s.e.m.; P value calculated using two-way ANOVA); area under the curve was measured at the final timepoint (P value calculated using one-tailed Mann–Whitney U test). n = 4–5 mice per cohort. c, MDA-MB-231 cells stably expressing sgHNRNPC or sgControl were injected orthotopically into mammary fat pads of NSG mice. Tumour volume was measured at indicated times (mean values are shown, error bars indicating s.e.m.; P value calculated using two-way ANOVA); tumour volume was compared at the time of tumour resection (P value calculated using one-tailed Mann–Whitney U test). d, MDA-MB-231 cells stably expressing sgHNRNPC or sgControl were injected orthotopically into mammary fat pads of NSG mice. Four weeks later, the primary tumours were resected and lung colonization was measured at indicated times. Lung bioluminescence was measured at the final timepoint (P value calculated using one-tailed Mann–Whitney U test). Lung sections were stained with H&E (representative images shown). n = 4–5 mice per cohort.

Source data

Intravenous injection of cancer cells in mice recapitulates the last steps of metastasis, including extravasation and proliferation in the distant tissue. To assess the role of HNRNPC in the earlier steps of metastasis, we performed spontaneous metastasis assays by orthotopically injecting control and HNRNPC-depleted cells into the mammary fat pads of NSG mice. After primary tumour resection, we followed the lung colonization by cancer cells using in vivo imaging. While HNRNPC KD cells showed slightly (albeit non-significantly) reduced primary tumour volumes (Fig. 4c), it had a markedly and significantly increased metastatic lung colonization (Fig. 4d), in agreement with our data from intravenous injection assays (Fig. 4a).

In its canonical role, HNRNPC acts as a regulator of RNA splicing, and its function in metastasis may in fact be a consequence of these parallel regulatory programmes. To assess this possibility, we sought to independently test PABPC4, which acts in concert with HNRNPC to control the APA of its targets. For this, we compared the metastatic lung colonization by PABPC4 KD, as well as PABPC4/HNRNPC double KD and control MDA-MB-231 cells (Extended Data Fig. 4f). In line with PABPC4 controlling the APA of a metastasis-associated mRNA regulon, PABPC4-depleted cells showed significantly increased metastatic potential when compared with control cells. More importantly, knocking down HNRNPC in the PABPC4 KD background did not result in an increase metastatic potential of cells. This is consistent with HNRNPC and PABPC4 acting as components of the same regulatory pathway, and showing an epistatic genetic interaction.

PDLIM5 acts downstream of HNRNPC to suppress metastasis

To better understand how the deregulation of APA and TE leads to increased metastatic potential, we sought to identify relevant targets downstream of the HNRNPC–PABPC4 regulatory axis. First, to complement our ribosome profiling results, we also compared the protein abundances in control and HNRNPC KD cells using TMT-MS. We observed that (1) consistent with its role in translational control, a large number of proteins were dysregulated upon HNRNPC depletion (Extended Data Fig. 5a), and (2) changes in the protein landscape of HNRNPC KD cells were significantly correlated with those between highly and poorly metastatic cells (Extended Data Fig. 5b). In other words, a significant portion of changes in the TE of MDA-LM2 cells relative to parental MDA-MB-231 can be explained by lower HNRNPC activity. Furthermore, gene-set enrichment analysis of this data revealed that HNRNPC KD caused the downregulation of proteins interacting with SH3 (Src homology 3) domain proteins and actin filaments, among other Gene Ontology terms (Extended Data Fig. 5c).

To identify genes that are part of this HNRNPC regulon and act downstream of this pathway to influence metastatic progression, we systematically integrated the datasets comparing poorly and highly metastatic cells, as well as HNRNPC KD and control cells. We specifically searched for mRNAs that (1) were translationally repressed and (2) demonstrated proximal-to-distal poly(A) site switching in MDA-LM2 and HNRNPC-deficient cells (16 and 30 transcripts, respectively). We focused on transcripts that (3) were bound by HNRNPC and PABPC4 (1,263 mRNAs) to identify the direct downstream targets. This approach nominated PDLIM5 (Fig. 5a), a member of cytoskeleton-associated protein family34, as a robust target of this HNRNPC-mediated pathway.

Fig. 5: PDLIM5 acts downstream of HNRNPC to suppress breast cancer metastasis.
figure 5

a, The schematics of PDLIM5 3′ UTR, showing proximal and distal poly(A) sites, HNRNPC binding sites as determined by CLIP-seq, and 3′-end RNA-seq results in MDA-MB-231 and MDA-LM2, as well as sgControl and sgHNRNPC-expressing cells. CPM, counts per million. b, Western blot analysis of PDLIM5 protein in MDA-MB-231 and MDA-LM2, as well as sgControl and sgHNRNPC-expressing cells. Relative PDLIM5 quantity normalized to GAPDH is indicated below, along standard errors. c, Quantification of relative PDLIM5 proximal to distal poly(A) site usage in MDA-MB-231 and MDA-LM2 cells (left) or sgControl and sgHNRNPC cells (right), as determined by isoform-specific RT–qPCR. n = 3 biological replicates. P values calculated using one-tailed Mann–Whitney U test. d, Relative reporter FLAG-PDLIM5 protein quantity (normalized to GAPDH), expressing full-length, short and long PDLIM5 3′ UTRs, as illustrated in Extended Data Fig. 5f. n = 6 biological replicates. P value calculated using two-tailed unpaired t-test. e, MDA-MB-231 cells stably expressing sgPDLIM5 or sgControl were injected via tail vein into NSG mice. Bioluminescence was measured at the indicated times (mean values are shown, error bars indicating standard error of the mean; P value calculated using two-way analysis of variance); area under the curve was measured at the final timepoint (P value calculated using one-tailed Mann–Whitney U test). n = 3 and 5 mice per cohort. f, MDA-MB-231 cells stably expressing sgControl or sgHNRNPC and mCherry (control) or PDLIM5 (PDLIM5-OE) were injected via tail vein into NSG mice. Bioluminescence was measured at the final timepoint (P value calculated using one-tailed Mann–Whitney U test). n = 4–5 mice per cohort.

Source data

In agreement with our ribosome profiling and TMT–MS data (Extended Data Fig. 5d), we detected lower levels of PDLIM5 protein in MDA-LM2 and HNRNPC-deficient cells, as compared with control MDA-MB-231 cells (Fig. 5b). In contrast, PDLIM5 mRNA abundance was similar in these conditions, which is consistent with PDLIM5 protein levels being regulated at the translational level (Extended Data Fig. 5e). We also confirmed by isoform-specific RT–qPCR the proximal-to-distal poly(A) site switch for PDLIM5 mRNA (Fig. 5c). To further confirm the regulation conferred by the 3′ regulatory sequences in PDLIM5 mRNA, we constructed a series of reporters, where we fused N-terminal FLAG tag with PDLIM5 CDS and several genetically encoded variants of PDLIM5 3′ UTR (Extended Data Fig. 5f). We found that the sequence downstream of the proximal poly(A) site in the PDLIM5 3′ UTR is responsible for the decrease in reporter protein quantity, as compared with full-length 3′ UTR (Fig. 5d).

To assess the role of PDLIM5 in breast cancer progression, we performed in vivo lung colonization assays with control and PDLIM5-deficient MDA-MB-231 and HCC1806 cells. As shown in Fig. 5e and Extended Data Fig. 5g, PDLIM5 KD (fold change of 0.18, P < 0.01, as determined by RT–qPCR) led to a significant increase in metastatic lung colonization of xenografted mice. In our model, PDLIM5 depletion impacts breast cancer progression due to HNRNPC deficiency. Indeed, an ectopic PDLIM5 expression in HNRNPC-deficient cells was sufficient to reduce the lung metastatic burden to the levels similar to control cells with intact HNRNPC expression (Fig. 5f), in line with HNRNPC acting upstream of PDLIM5 to control breast cancer metastasis.

Cancer cells exploit multiple phenotypic routes towards metastasis, including modulating their proliferative activity, invasiveness, migration, as well as the capacity to survive in isolation. To test if HNRNPC and PDLIM5 contribute to any of these phenotypes, we performed proliferation, colony formation, migration and invasion assays with control, and HNRNPC- or PDLIM5-perturbed cells. In agreement with our lung colonization assays in vivo (Figs. 4 and 5), HNRNPC and PDLIM5 KDs caused an increase, while HNRNPC overexpression (OE) caused a decrease, in cell migration and invasive capacity in vitro (Extended Data Fig. 5h,i). These observations contrasted with the reduced proliferation and colony formation of HNRNPC-perturbed cells (Extended Data Fig. 5j–l), consistently with primary tumour growth rates of HNRNPC KD cells in vivo (Fig. 4c). Similar to our in vivo data (Fig. 5f), ectopic PDLIM5 expression rescued the proliferation and colony formation defects of HNRNPC KD cells (Extended Data Fig. 5m).

The HNRNPC–PABPC4 regulatory axis is linked with clinical outcomes

To confirm that our findings in xenograft models are generalizable to human disease, we performed survival analysis on publicly available datasets from breast cancer patients. We found that lower HNRNPC expression in breast cancer tumours was significantly associated with lower overall, disease-free and distant metastasis-free survival, both in individual cohorts and in meta-analyses (Fig. 6a–c and Extended Data Fig. 6a–c). We detected that HNRNPC expression was negatively associated with tumour stage (Fig. 6d) and presence of metastasis (Fig. 6e), but not with tumour subtype (Extended Data Fig. 6d). Notably, HNRNPC expression remained a significant covariate in a Cox proportional hazards model even after controlling for other known prognostic metrics, such as tumour stage or received treatment (Extended Data Fig. 6e).

Fig. 6: HNRNPC expression is associated with clinical outcomes in patients with breast cancer.
figure 6

a, Kaplan–Meier survival curve showing association between tumour HNRNPC levels and disease-free survival in the METABRIC cohort. b, Kaplan–Meier survival curve showing association between tumour HNRNPC levels and overall survival in the TCGA-BRCA cohort. c, Kaplan–Meier survival curve showing association between tumour HNRNPC levels and distant metastasis-free survival (DMFS) in a collection of breast cancer patient cohorts. Hazard ratios (HR) and P values (calculated using log-rank test) are shown (ac). d, HNRNPC mRNA levels across breast cancer tissue stages I–IV in the METABRIC cohort. P value calculated using one-way analysis of variance (ANOVA). Box centre reports the median value, the boundaries—the quartiles and the whiskers—and the 10th and 90th percentiles. e, HNRNPC mRNA levels in non-metastatic (M0) and metastatic (M1) breast tumours in the TCGA-BRCA cohort. P value calculated using two-tailed Mann–Whitney U test. Box plot characteristics as in d. f, Kaplan–Meier survival curve showing association between tumour PABPC4 levels and distant metastasis-free survival (DMFS) in a collection of breast cancer patient cohorts. Hazard ratios (HR) and P values (calculated using log-rank test) are shown. g, PABPC4 mRNA levels across breast cancer tissue stages I–IV in the METABRIC cohort. P value calculated using one-way ANOVA. Box plot characteristics as in d. h, Kaplan–Meier survival curve showing association between tumour PDLIM5 levels and distant metastasis-free survival (DMFS) in a collection of breast cancer patient cohorts. Hazard ratios (HR) and P values (calculated using log-rank test) are shown. i, MDA-LM2 cells treated with T4 or vehicle control (DMSO) at 3 µM for 6 h were injected via tail vein into NSG mice. Bioluminescence was measured at the indicated times (mean values are shown, error bars indicating standard error of the mean (s.e.m.); P value calculated using two-way ANOVA). Lung sections were stained with H&E (representative images shown). n = 4–5 mice per cohort. j, NSG mice were intravenously injected with MDA-LM2, and intraperitoneally injected with 10 mg kg−1 T4 or vehicle control for three consecutive days, starting on the day of cancer cell injection. Bioluminescence was measured at the indicated times (mean values are shown, error bars indicating s.e.m.; P value calculated using two-way ANOVA); area under the curve was measured at the final timepoint (P value calculated using one-tailed Mann–Whitney U test). n = 4–5 mice per cohort.

Source data

As we found HNRNPC to act upstream of a metastasis-suppressive translational programme, we identified a set of HNRNPC mRNA targets, translationally repressed and undergoing proximal-to-distal poly(A) site switching in MDA-LM2 cells to define a translational HNRNPC target signature. We observed that, in proteomic datasets from patients with breast cancer (CPTAC), lower protein levels of the HNRNPC signature, as an aggregate, were significantly associated with lower overall and progression-free survival (Extended Data Fig. 6f,g). In line with PABPC4 acting together with HNRNPC, lower PABPC4 expression was associated with worse prognostic metrics and disease progression in breast cancer patient cohorts (Fig. 6f,g and Extended Data Fig. 6h). Furthermore, in agreement with PDLIM5 being a functional effector downstream of the HNRNPC–PABPC4 axis, PDLIM5 expression was also associated with survival of patients with breast cancer (Fig. 6h and Extended Data Fig. 6i,j).

Finally, we asked whether the impact of the HNRNPC–PABPC4 deficiency in highly metastatic cells could be reversed as a potential therapeutic strategy to prevent metastasis. Recently, a target agnostic chemical screen was used to identify small molecules that impact APA or transcription termination35. We chose to test T4, a drug that was reported to induce distal-to-proximal poly(A) site switch35, which is the opposite of the observed 3′ UTR lengthening of HNRNPC targets in MDA-LM2 cells. We first confirmed that treating MDA-MB-231 cells with 5 µM T4 for 6 h induced predominantly distal-to-proximal poly(A) site switching, as assessed by 3′ end RNA-seq (Extended Data Fig. 6k). Importantly, we observed that T4 treatment can reverse the impact of HNRNPC KD on APA site selection (Extended Data Fig. 6l,m). These results suggested that T4 could potentially counteract the 3′ UTR lengthening of HNRNPC targets, and compromise the pro-metastatic programme instigated by HNRNPC deficiency. To test this possibility, we first performed dose–response measurements of T4 in MDA-LM2 cells (Extended Data Fig. 6n), and treated these cells with the 20% of maximal inhibitory concentration (IC20) of the drug (3 µM) or vehicle control for 6 h. The effect of T4 treatment on the APA remained stable for 24 h, and significant for up to 72 h post drug withdrawal (Extended Data Fig. 6o). We then performed lung colonization assays to measure changes in the metastatic capacity of cells treated in vitro with T4. As shown in Fig. 6i, T4-treated MDA-LM2 cells showed significantly reduced metastatic capacity as compared with vehicle-treated cells. Finally, we assessed if systemic treatment with T4 could prevent metastatic lung colonization. Indeed, intraperitoneal injection of T4 significantly impaired the lung colonization by MDA-LM2 cells as compared with vehicle control treatment (Fig. 6j). These observations indicate that reversing the regulatory consequences of HNRNPC deficiency can restore the metastasis-suppressive activity of its target regulon.

Discussion

In this study, we show that increased metastatic potential in breast cancer cell lines and PDXs is accompanied by a broad remodelling of the translational landscape, as demonstrated by genome-wide TE measurements derived from Ribo-seq. Using unbiased computational approaches, we discovered that translationally downregulated mRNAs in highly metastatic breast cancer cell lines and PDXs showed enriched poly(U) motifs in their 3′ UTRs. We identified a link between HNRNPC binding to these sequence elements and translational control of the bound transcript, showing that this mechanism relies on control of the APA.

We found that HNRNPC deficiency resulted in increased usage of distal poly(A) sites of target transcripts, leading to 3′ UTR lengthening of HNRNPC targets, consistent with previous reports28,29. It was suggested that HNRNPC masks strong distal poly(A) sites, thereby promoting usage of weaker proximal sites28. In line with this, we found that canonical poly(A) signals were in close proximity to HNRNPC-bound poly(U) motifs. We could counteract the proximal-to-distal poly(A) site switch caused by HNRNPC deficiency with T4, a small molecule, promoting the distal-to-proximal switch35. While the mechanism of action of T4 is not completely clear, it alters the expression levels of multiple splicing and cleavage and polyadenylation factors35, emphasizing the interplay between the two pathways.

We also showed that PABPC4 bound HNRNPC targets in vivo and interacted with HNRNPC in controlling APA. PABPC4 is a nucleus–cytoplasm shuttling factor, and is known to have context-dependent functions, overlapping those of other poly(A) binding proteins30. PABPC4 is critical for the differentiation of erythroid cells, via an interplay between AU-rich elements in 3′ UTR of target mRNAs and the shortening of poly(A) tails36. Poly(A) tail shortening is a well-known mechanism in promoting mRNA decay and downregulating translation37. It is possible that reduced PABPC4 expression in highly metastatic cells contributes to the lower TE of joint HNRNPC–PABPC4 targets via poly(A) tail shortening.

Our data suggest that the long form 3′ UTRs of HNRNPC target mRNAs harbour a greater number of miRNA binding sites and thus are more susceptible to translational repression via argonaute-mediated RNA interference8,33, although we cannot exclude other RBPs participating in this mechanism. While some studies suggest that miRNAs have a limited impact on global translational repression and destabilization of APA targets38,39, our data highlight a case where a subset of mRNAs—HNRNPC targets with 3′ UTR lengthening in highly metastatic cells—undergo AGO2-dependent translational repression. Furthermore, HNRNPC targets inherently contain a poly(U) stretch in their 3′ UTR, which might favour interaction with other poly(U) binding RBPs and lead to translational repression.

We identified PDLIM5 as a downstream target of HNRNPC and showed that PDLIM5 KD phenocopied the pro-metastatic phenotype of HNRNPC-depleted cells. PDLIM5 (PDZ and LIM domain 5) is a member of cytoskeleton-associated protein family, implicated in cell–cell, cell–extracellular matrix interactions and cell migration34. It also participates in the mechanosensing cascade via YAP/TAZ signalling40. PDLIM5 is phosphorylated by AMPK, and this modulates its function in cell migration41. Interestingly, these cellular and molecular functions were enriched among the downregulated proteins in HNRNPC-depleted cells.

We have uncovered an intricate gene regulatory programme at the intersection of APA and translational control mediated by HNRNPC and PABPC4 that plays a metastasis suppressing role in breast cancer. Our clinical association analyses suggest that HNRNPC expression, along with that of its regulon, could be used as a prognostic metric for disease progression. We also provide evidence that HNRNPC-low tumours could benefit from therapeutic strategies targeting APA.

Methods

This study complies with all relevant ethical regulations that are approved by the University of California, San Francisco (UCSF) Institutional Review Board and Institutional Animal Care and Use Committee (IACUC, approval number AN179718).

Cell culture

All cells were cultured in a 37 °C 5% CO2 humidified incubator. The MDA-MB-231 (ATCC HTB-26) breast cancer cell line, its highly metastatic derivative MDA-LM2 (ref. 15), and 293T cells (ATCC CRL-3216) were cultured in Dulbecco’s modified Eagle medium high-glucose medium supplemented with 10% foetal bovine serum (FBS), glucose (4.5 g l−1), l-glutamine (4 mM), sodium pyruvate (1 mM), penicillin (100 units ml−1), streptomycin (100 μg ml−1) and amphotericin B (1 μg ml−1) (Gibco). The HCC1806-LM2 cell line (an in vivo selected highly lung metastatic derivative of the HCC1806 breast cancer line (ATCC CRL-2335)) was cultured in RPMI-1640 medium supplemented with 10% FBS, glucose (2 g l−1), l-glutamine (2 mM), 25 mM HEPES, penicillin (100 units ml−1), streptomycin (100 μg ml−1) and amphotericin B (1 μg ml−1) (Gibco). All cell lines were routinely screened for mycoplasma with a PCR-based assay and tested negative.

Cell proliferation assays

At day zero, 103 cells per well were seeded in 96-well plates, in six biological replicates. One, three and five days later the cell proliferation was assessed using CellTiter-Glo Luminescent Cell Viability Assay (Promega). An exponential model was then used to fit a growth rate for each sample (ln(Nt-1/N1) = rt where t is measured in days), and an unpaired two-sided t-test was used to test for significant variations.

Colony formation assays

Five-hundred cells were seeded in six-well plates, in biological triplicates. After the colonies reached a desired size (1–2 weeks later), the cells were paraformaldehyde (PFA)-fixed, stained with aqueous 0.1% (w/v) crystal violet solution and destained with several washes in de-ionized water. The colonies were manually counted, and an unpaired two-sided t-test was used to test for significant variations.

Cell migration and Matrigel invasion assays

The cells were serum starved in the medium containing 0.2% FBS for 18 h, then detached and counted. A total of 4 × 105 cells were seeded onto transwell inserts (containing 8-µm-pore-size polyester membrane, Sterlitech 9328012) in four biological replicates, in 500 µl starving medium. For invasion assays, the transwells were pre-coated with 80 µl 0.15 mg ml−1 Matrigel (Corning) in starving medium, for 2 h at 37 °C. For migration assays, the transwell inserts were left untreated. The transwells were put in 24-well plates, and complete medium (containing 10% FBS) was added, 500 µl per well. Twenty-four hours later the cells from the inside of transwells were removed by scraping, and the transwells and wells were washed and PFA-fixed. The cells that had migrated through the membrance were then permeabilized in 0.1% Triton X solution in 1× phosphate-buffered saline (PBS) and DAPI-stained. The membranes were cut, mounted on glass slides and imaged on an inverted fluorescence microscope in UCSF Nikon Imaging Center. The invaded cells were manually counted, and two-tailed Mann–Whitney U test was used to test for significant variations.

T4 treatment

T4 (Enamine EN300-7536403 or Sigma SML2299) was dissolved in dimethyl sulfoxide (DMSO) at 10 mM stock concentration. For 3′-end RNA-seq, cells were treated with 5 µM T4 or an equivalent amount of DMSO for 6 h, before RNA extraction. For dose–response measurements, the cells were treated with T4 at indicated concentrations for 6 h, after which the medium was changed and the cell viability was measured 72 h later using CellTiter-Glo Luminescent Cell Viability Assay (Promega) in six biological replicates. For in vivo metastasis assays, MDA-LM2 cells were treated with an IC20 concentration of T4 (3 µM) or DMSO control for 6 h, and then immediately collected for tail vein injections. For time course experiments, MDA-LM2 cells were treated with 3 µM T4 or DMSO control for 6 h, then washed and incubated in drug-free medium until RNA extraction 24 h or 72 h later.

Gene KD and OE

MDA-MB-231, MDA-LM2 and HCC1806-LM2 cells expressing dCas9-KRAB fusion protein were constructed by lentiviral delivery of pMH0006 (Addgene #135448) and fluorescence-activated cell sorting (FACS) isolation of BFP-positive cells.

The lentiviral constructs were co-transfected with pCMV-dR8.91 and pMD2.D plasmids using TransIT-Lenti (Mirus) into 293T cells, following the manufacturer’s protocol. Virus was collected 48 h post-transfection and passed through a 0.45 µm filter. Target cells were then transduced overnight with the filtered virus in the presence of 8 µg ml−1 polybrene (Millipore).

Guide RNA sequences for CRISPRi-mediated gene KD were cloned into pCRISPRia-v2 (Addgene #84832) via BstXI-BlpI sites (for single guide RNA sequences, see Supplementary Table 2). For double KD experiments, pCRISPRia-v2 plasmid was modified to construct pCRISPRia-v2-Blast, replacing puromycin acetyltransferase by blasticidin deaminase CDSs. After transduction with single guide RNA lentivirus, MDA-MB-231, MDA-LM2 and HCC1806-LM2 CRISPRi cells were selected with 1.5 µg ml−1 puromycin or 20 µg ml−1 blasticidin (Gibco). KD of target genes was assessed by RT–qPCR as described below. We consistently observed two- to five-fold decrease in target mRNA expression, with the exception of PABPN1. PABPN1 is labelled as common essential gene in DepMap Portal database, and we observed only modest PABPN1 KD efficiency in MDA-MB-231 cells, accompanied by severe impact on cell proliferation.

The EF1α promoter-driven lentiviral reporter has been described42. PDLIM5, HNRNPC or mCherry CDSs were synthesized (Integrated DNA Technologies (IDT)) and cloned by restriction digestion (PacI) and Gibson assembly. The constructs were delivered to MDA-MB-231 or MDA-LM2 cells by lentiviral transduction and puromycin selection. The expression of transgenes was evaluated by RT–qPCR as described below. We consistently observed seven- to ten-fold increase in target mRNA expression.

RNA isolation

Total RNA for RNA-seq and RT–qPCR was isolated using the Zymo QuickRNA isolation kit with in-column DNase treatment per the manufacturer’s protocol.

RT–qPCR

Transcript levels were measured using RT–qPCR by reverse transcribing total RNA to complementary DNA (Maxima H Minus RT, Thermo), then using PerfeCTa SYBR Green SuperMix (QuantaBio) per the manufacturer’s instructions. HPRT1 was used as endogenous control (for primer sequences, see Supplementary Table 2).

Western blotting

Cell lysates were prepared by lysing cells in ice-cold RIPA buffer (25 mM Tris–HCl pH 7.6, 0.15 M NaCl, 1% IGEPAL CA-630, 1% sodium deoxycholate and 0.1% SDS) containing 1× protease inhibitors (Thermo Scientific). Lysate was cleared by centrifugation at 20,000g for 10 min at 4 °C. Samples were denatured for 10 min at 70 °C in 1× LDS loading buffer (Invitrogen) and 50 mM dithiothreitol (DTT). Proteins were separated by SDS–PAGE using 4–12% Bis-Tris NuPAGE gels, transferred to nitrocellulose (Millipore), blocked using 5% bovine serum albumin and probed using target-specific antibodies. Bound antibodies were detected using horseradish peroxidase (HRP)-conjugated secondary antibodies (anti-mouse IgG, light chain specific, HRP (Cell Signaling 91196S, 1/5,000) or anti-rabbit IgG, conformation specific, HRP (Cell Signaling 5127S, 1/5,000)) and ECL substrate (Pierce) or infra-red dye-conjugated secondary antibodies (anti-mouse IgG, IRDye 800CW (Li-Cor 926-32210, 1/10,000) or anti-rabbit IgG, IRDye 680RD (Li-Cor 926-68071, 1/10,000)) according to the manufacturer’s instructions. Primary antibodies: β-tubulin (Proteintech 66240-1-Ig, 1/10,000), HNRNPC (Santa Cruz sc-32308, 1/1,000), PABPC4 (Proteintech 14960-1-AP, 1/1,000), PABPN1 (Proteintech 66807-1-Ig, 1/1,000), PDLIM5 (Proteintech 10530-1-AP, 1/1,000), TXNRD1 (Proteintech 11117-1-AP, 1/3,000), PGK1 (Proteintech 17811-1-AP, 1/5,000), GAPDH (Proteintech 60004-1-Ig, 1/10,000) and FLAG tag (Proteintech 66008-1-Ig, 1/1,000).

Subcellular fractionation

Cells were lysed in ice-cold cytoplasmic lysis buffer (20 mM Tris pH 7.5, 0.15 M NaCl, 5 mM MgCl2, 1 mM DTT, 0.15% Igepal CA-630 and RiboLock 40 U ml−1) containing 1× protease inhibitors (Thermo Scientific) for 5 min on ice. The lysates were then centrifuged on a sucrose cushion (20 mM Tris pH 7.5, 0.15 M NaCl and 25% sucrose) at 16,000g for 10 min at 4 °C. Top layer lysate was taken as cytoplasmic fraction, and used for RNA extraction with Zymo QuickRNA isolation kit with in-column DNase treatment per the manufacturer’s protocol. The nuclear fraction (pellet) was washed in Nuclei Wash buffer (1× PBS, 1 mM ethylenediaminetetraacetic acid (EDTA) and 0.1% Triton) and spun down for 1 min at 1,150g at 4 °C. The pellet was solubilized in ice-cold RIPA buffer with RNase and protease inhibitors, and the nuclear lysates were used for RNA extraction as described above.

Reporter assays

The bidirectional CMV promoter-driven lentiviral reporter, expressing eGFP and PuroR-T2A-mCherry fusion has been described43. The eGFP open reading frame was replaced with FLAG-BFP CDS by restriction digestion (EcoRV-NheI) and Gibson cloning. A region encompassing two alternative poly(A) sites of KAT14 gene (hg38 chr20:18,187,963-18,188,401) was cloned downstream of the BFP via MluI-PacI sites. KAT14 was chosen as it satisfied two criteria: (1) the distance between the alternative poly(A) sites allowing an amplicon compatible with Illumina sequencing; (2) the usage of both poly(A) sites in MDA-MB-231 cells detectable by RT–PCR. A library of sequences (150 bp long), containing CLIP-seq-derived HNRNPC binding sites in 3′ UTRs or nucleotide-content-matched shuffled sequences (for sequences, see Supplementary Table 3), was cloned upstream of the proximal poly(A) site via MluI digestion and Gibson assembly. The reporter library was delivered to MDA-MB-231 cells by lentiviral transduction and puromycin selection. The resulting cells were then transfected with control or HNRNPC-targeting small interfering RNAs (IDT) using Lipofectamine 2000 (Thermo Scientific). Seventy-two hours post-transfection the cells were analysed by flow cytometry, determining BFP/mCherry ratio. Total RNA was extracted from a portion of transfected cells, and the remaining cells were FACS sorted into two bins, representing the top and bottom 25% of the BFP/mCherry ratio. The RNA extracted from the total and sorted cells was used to amplify the reporter RNA by RT–PCR. In brief, an anchored oligo dT primer containing a unique molecular identifier (UMI) was used for reverse transcription, and a BFP-specific primer for PCR (for primer sequences, see Supplementary Table 2). The amplicon libraries were sequenced on a paired-end Illumina sequencing run, allowing matching of the poly(A) site chosen (proximal or distal) with the library insert upstream of the proximal poly(A) site. The reads matching to each library element and poly(A) site used were compared with APAlog.

The EF1α promoter-driven lentiviral reporter has been described42. The FLAG-PDLIM5 CDS (synthesized by IDT) and PCR-amplified PDLIM5 3′ UTR fragments (for primer sequences, see Supplementary Table 2) were cloned by restriction digestion (PacI) and Gibson assembly. The reporters were delivered to MDA-MB-231 cells by lentiviral transduction and puromycin selection. The reporter protein expression was determined by western blotting.

Ribosome profiling

Ribosome profiling was performed as previously described44 with following modifications. Snap-frozen PDX tumours were cryoground into powder on dry ice, and then resuspended in ice-cold lysis buffer. The RNA concentration in the lysates was determined with the Qubit RNA HS kit (Thermo).

Monosomes were isolated using MicroSpin S-400 HR (Cytiva) columns, pre-equilibrated with 3 ml polysome buffer per column. One-hundred microlitres of digested lysate was loaded per column (two columns were used per 200 µl sample) and centrifuged 2 min at 600g. The RNA from the flow-through was isolated using the Zymo RNA Clean and Concentrator-25 kit. In parallel, total RNA from undigested lysates were isolated using the same kit.

Libraries were sequenced on a SE50 run on Illumina HiSeq4000 instrument at UCSF Center for Advanced Technologies.

To process the reads, the Ribo-seq reads were first trimmed using cutadapt (v2.3) to remove the linker sequence AGATCGGAAGAGCAC. The fastx_barcode_splitter script from the Fastx toolkit (v0.0.13) was then used to split the samples based on their barcodes. Since the reads contain UMIs, they were collapsed to retain only unique reads. The UMIs were then removed from the beginning and end of each read (two and five nucleotides, respectively) and appended to the name of each read. Bowtie2 (v2.3.5) was then used to remove reads that map to ribosomal RNAs and tRNAs, and the remainder of reads were then aligned to mRNAs (we used the isoform with the longest CDS for each gene as the representative). Subsequent to alignment, umitools (v0.3.3) was used to deduplicate reads.

RNA-seq and 3′-end RNA-seq

RNA-seq libraries (used for calculating TEs) were prepared using SMARTer Stranded Total RNA-Seq Kit v2 Pico Input Mammalian (Takara), with 50 ng total RNA as input. The 3′-end RNA-seq libraries (used for determining poly(A) site usage) were prepared using QuantSeq 3′ mRNA-Seq Library Prep Kit REV (Lexogen), with 500 ng total RNA as input. Libraries were sequenced as SE50 runs on Illumina HiSeq4000 instrument at UCSF Center for Advanced Technologies.

To compare changes in 3′ UTR usage and poly(A) site selection, we first annotated unique 3′ ends of transcripts using Gencode annotations (v33). Salmon (v0.14.1) was then used to count the number of reads that match each of the annotated ends. The normalized abundances were then tabulated, and APAlog (see below) was used to perform pairwise comparisons between proximal and distal poly(A) sites between conditions.

To assess whether HNRNPC binding was associated with the observed changes in logAPAR values, proximal sites within 500 nt of annotated HNRNPC peaks (based on CLIP-seq datasets) were annotated. The behaviour of these HNRNPC-associated proximal sites was then compared with the background using Wilcoxon rank sum test. Alternatively, logAPAR values were binned into equally populated bins, and the enrichment/depletion patterns of HNRNPC-associated proximal sites were assessed as previously described23.

HNRNPC CLIP-seq

CLIP-seq for endogenous HNRNPC in MDA-MB-231 cells was performed using irCLIP45, with the following modifications. The cells were crosslinked with 400 mJ cm−2 254 nm UV. Cells were lysed in CLIP lysis buffer (1× PBS, 0.1% SDS, 0.5% sodium deoxycholate and 0.5% IGEPAL CA-630) supplemented with 1× protease inhibitors (Thermo) and SUPERaseIN (Thermo), then treated with DNase I (Promega) for 5 min at 37 °C. Lysate was clarified by spinning at 21,000g at 4 °C for 15 min. RNA–protein complexes were then immunoprecipitated from the clarified lysate using protein G Dynabeads (Thermo) conjugated to anti-HNRNPC (Santa Cruz sc-32308) for 2 h at 4 °C. Beads were washed sequentially with high-stringency buffer, high-salt buffer and low-salt buffer. RNA–protein complexes were then nuclease treated on-bead with RNase A (Thermo), and then ligated to the irCLIP adaptor using T4 RNA ligase (NEB) overnight at 16 °C. RNA–protein complexes were then eluted from beads, resolved on a 4–12% Bis-Tris NuPAGE gel, transferred to nitrocellulose, then imaged using an Odyssey Fc instrument (Licor). Regions of interest were excised from the membrane, and the RNA was isolated by Proteinase K digestion followed by pulldown with oligo d(T) magnetic beads (Thermo). The resulting RNA was then reverse transcribed using Superscript IV RT (Invitrogen) and a barcoded RT primer, purified using MyOne C1 Dynabeads (Invitrogen) and then circularized using CircLigase II (Epicentre). Two rounds of PCR were then performed to first amplify the library using adaptor-specific primers and to add sequences compatible with Illumina sequencing instruments. The libraries were then sequenced as SE50 runs on Illumina HiSeq4000 instrument at UCSF Center for Advanced Technologies.

The CTK package (v1.1.13, CLIP toolkit46) was used to annotate peaks from CLIP-seq data. Reads were first collapsed using their UMIs. UMI-tools package was then used to extract the UMI followed by quality trimming (-q 15) and linker removal using cutadapt. BWA (v0.7.17) was then used to align reads to the genome (hg38). CTK scripts were then used to remove PCR duplicates, parse alignments and call peaks using a valley-seeking algorithm (with multi-testing correction). The boundaries of the resulting peaks were combined across multiple independent CLIP experiments, and their union with a previously published HNRNPC iCLIP (E-MTAB-1371) was used to define a comprehensive HNRNPC binding across the transcriptome. To identify motifs, sequences across annotated peak were extracted; control sequences were generated by scrambling the real sequences while maintaining dinucleotide sequence content. FIRE17 was then used to find enriched sequence motifs.

PABPC4 PAPERCLIP and TIA1, ELAVL1 CLIP-seq

The MDA-MB-231 cells were crosslinked with 400 mJ cm−2 254 nm UV. Cells were lysed in CLIP lysis buffer (1× PBS, 0.1% SDS, 0.5% sodium deoxycholate and 0.5% IGEPAL CA-630) supplemented with 1× protease inhibitors (Thermo) and SUPERaseIN (Thermo), then treated with DNase I (Promega) for 5 min at 37 °C. Lysates were then split in half and separately treated with medium and low dilutions of RNaseA and RNaseI (Thermo; 1/3,000 RNaseA and 1/100 RNaseI, and 1/15,000 RNaseA and 1/500 RNaseI, respectively). Lysates were then clarified by spinning at 21,000g at 4 °C for 15 min. Clarified lysates were pooled, and RNA–protein complexes were then immunoprecipitated using protein A/G beads (Pierce) conjugated to anti-PABPC4 (Proteintech 14960-1-AP), or anti-TIA1 (Proteintech 12133-2-AP), or anti-ELAVL1 (Proteintech 11910-1-AP) for 2 h at 4 °C. Beads were then washed sequentially with low-salt buffer, high-salt buffer and PNK buffer. Protein-bound RNAs were end-repaired on beads using T4 PNK (NEB) and 3′-end labelled with azide-dUTP using yeast poly(A) polymerase (Jena). The protein–RNA complexes were labelled with IRDye800-DBCO conjugates (LiCor). The protein–RNA complexes were then eluted from beads, resolved on a 4–12% Bis-Tris NuPAGE gel, transferred to nitrocellulose and imaged using an Odyssey Fc instrument (LiCor). Regions of interest were excised from the membrane, and the RNA was isolated by Proteinase K digestion and phenol/chloroform extraction. Eluted RNA was used for library preparation using SMARTer smRNA-Seq Kit (Takara), with following modifications. The poly(A) tailing step was omitted, and reverse transcription was performed with a custom RT primer (Supplementary Table 2). The library PCR was performed with index forward (i5) primers and universal reverse (P7) primer (Supplementary Table 2). The libraries were purified using Zymo Select-a-Size beads and sequenced as a SE50 run on Illumina HiSeq4000 instrument at UCSF Center for Advanced Technologies.

TMT–MS

The cell lysates were prepared, digested and labelled using TMT10plex Isobaric Mass Tagging Kit (Thermo), as per the manufacturer’s instructions. The labelling reactions were cleaned up and fractionated using Pierce High pH Reversed-Phase Peptide Fractionation Kit (Thermo).

Peptides were analysed on a Thermo Fisher Orbitrap Fusion Lumos Tribid MS system equipped with an Easy nLC 1200 ultrahigh-pressure liquid chromatography system interfaced via a Nanospray Flex nanoelectrospray source. Samples were injected on a C18 reverse phase column (25 cm × 75 µm packed with ReprosilPur C18 AQ 1.9 µm particles). Peptides were separated by a gradient from 5% to 32% acetonitrile in 0.02% heptafluorobutyric acid over 120 min at a flow rate of 300 nl min−1. Spectra were continuously acquired in a data-dependent manner throughout the gradient, acquiring a full scan in the Orbitrap (at 120,000 resolution with an AGC target of 400,000 and a maximum injection time of 50 ms) followed by ten MS/MS scans on the most abundant ions in 3 s in the dual linear ion trap (turbo scan type with an intensity threshold of 5,000, CID collision energy of 35%, AGC target of 10,000, maximum injection time of 30 ms and isolation width of 0.7 m/z). Singly and unassigned charge states were rejected. Dynamic exclusion was enabled with a repeat count of 1, an exclusion duration of 20 s and an exclusion mass width of ±10 p.p.m. Data were collected using the MS3 method47 for obtaining TMT tag ratios with MS3 scans collected in the orbitrap at a resolution of 60,000, HCD collision energy of 65% and a scan range of 100–500.

Protein identification and quantification were done with Integrated Proteomics Pipeline (IP2, Integrated Proteomics Applications) using ProLuCID/Sequest, DTASelect2 and Census48,49. Tandem mass spectra were extracted into ms1, ms2 and ms3 files from raw files using RawExtractor50 and were searched against the UniProt human protein database plus sequences of common contaminants, concatenated to a decoy database in which the sequence for each entry in the original database was reversed51. Search space included all fully tryptic peptide candidates with no missed cleavage restrictions. Carbamidomethylation (+57.02146) of cysteine was considered a static modification; TMT tag masses, as given in the TMT kit product sheet, were also considered static modifications. We required one peptide per protein and both tryptic termini for each peptide identification. The ProLuCID search results were assembled and filtered using the DTASelect program with a peptide false discovery rate (FDR) of 0.001 for single peptides and a peptide FDR of 0.005 for additional peptides for the same protein. Under such filtering conditions, the estimated FDR was between zero and 0.06 for the datasets used. Quantitative analysis on MS3-based MultiNotch TMT data was analysed with Census 2 in IP2 platform47,52,53. As TMT reagents are not 100% pure, we referred to the Thermo Fisher Scientific TMT product data sheet to obtain purity values for each tag and normalized reporter ion intensities. While identification reports best hit for each peptide, Census extracted all PSMs that can be harnessed to increase accuracy from reporter ion intensity variance. Extracted reporter ions were further normalized by using total intensity in each channel to correct sample amount error.

Co-IP–MS

MDA-MB-231 and MDA-LM2 cells (10 × 106 per replicate) were washed with ice-cold 1× PBS and lysed in nuclei lysis buffer (100 mM Tris–HCl pH 7.5, 0.5% SDS and 1 mM EDTA) containing 1× protease inhibitors (Thermo Scientific) on ice for 10 min. The lysates were then diluted with four volumes of IP dilution buffer (62.5 mM Tris–HCl pH 7.5, 187.5 mM NaCl, 0.625% Triton X-100 and 1 mM EDTA) with protease inhibitors and passed through a 25 G needle several times. The lysates were cleared 10 min at 21,000g at +4 °C and used for IP.

For co-IP–MS analysis, HNRNPC antibody was covalently bound to the magnetic beads. For this, HNRNPC antibody (Santa Cruz sc-32308) or mouse IgG (Jackson 015-000-003) was first purified using Protein A/G beads (Thermo). Briefly, 3 µg of antibody were bound to 15 µl Protein A/G beads (per IP replicate) in Modified Coupling buffer (20 mM sodium phosphate pH 7.2, 315 mM NaCl, 0.1 mM EDTA, 0.1% IGEPAL CA-630 and 0.5% glycerol) and incubated 15 min at room temperature. Then the beads were washed twice in modified coupling buffer and once in coupling buffer (20 mM sodium phosphate pH 7.2 and 300 mM NaCl), and the antibody was eluted in 0.1 M sodium citrate buffer (pH 2.5) for 5 min at room temperature. After neutralization with 1/10 volume of 1 M sodium phosphate buffer (pH 8), the antibody was coupled to M270 Epoxy Dynabeads (Thermo Scientific) in ammonium sulfate buffer (0.1 M sodium phosphate pH 7.4 and 1.2 M ammonium sulfate, final concentration) overnight at 37 °C. Before usage, the antibody conjugated beads were washed four times in 1× PBS, once in 1× PBS supplemented with 0.5% Tween-20 and resuspended in 1× PBS.

Protein complexes were immunoprecipitated with antibody-conjugated beads for 2 h at 4 °C, washed three times in wash buffer (15 mM Tris–HCl pH 7.5, 150 mM NaCl and 0.1% Triton X-100) and eluted in 1× NuPage LDS sample buffer with 0.1 M DTT for 10 min at 70 °C. Eluates were then subjected to alkylation, detergent removal, and trypsin digestion using Filter Aided Sample Preparation protocol54, followed by desalting using StageTips55. Desalted peptides were subsequently lyophilized by vacuum centrifugation, resuspended in 7 μl of A* buffer (2% acetonitrile, 0.5% acetic acid and 0.1% trifluoroacetic acid in water), and analysed on a Q-Exactive plus Orbitrap mass spectrometer coupled with a nanoflow ultimate 3000 RSL nano HPLC platform (Thermo Fisher), as described before56. Briefly, 6 μl of each peptide sample was resolved at 250 nl min−1 flow rate on an Easy-Spray 50 cm × 75 μm RSLC C18 column (Thermo Fisher), using a 123 min gradient of 3% to 35% of buffer B (0.1% formic acid in acetonitrile) against buffer A (0.1% formic acid in water), followed by online infusion into the mass spectrometer by electrospray (1.95 kV, 255C). The mass spectrometer was operated in data-dependent positive mode. A TOP15 method in which each MS scan is followed by 15 MS/MS scans was applied. The scans were acquired at 375–1,500 m/z range, with a resolution of 70,000 (MS) and 17,500 (MS/MS). A 30 s dynamic exclusion was applied. MaxQuant (v1.6.3.3) was used for all MS search and protein quantifications. All downstream MS data analysis was performed using Perseus (v1.6.2.3).

For co-IP/western blot analysis, when indicated, the lysates were pre-treated with RNaseA (10 µg RNaseA per 1 mg lysate, 10 min on ice) and incubated with HNRNPC antibody or mouse IgG overnight at +4 °C. The protein complexes were then immunoprecipitated with Protein A/G beads for 2 h at +4 °C, washed three times with wash buffer (15 mM Tris–HCl pH 7.5, 150 mM NaCl and 0.1% Triton X-100) and eluted in 1× NuPage LDS sample buffer with 0.1 M DTT for 10 min at 70 °C.

Metastatic colonization assay

Mice were housed in accordance with UCSF IACUC protocol (approval number AN179718) in humidity- and temperature-controlled rooms on a 12 h light–dark cycle with free access to food and water. Seven- to 12-week-old age-matched female NSG mice (Jackson Labs, 005557) were used for lung colonization assays. For this assay, cancer cells constitutively expressing luciferase were suspended in 100 μl PBS and then injected via tail vein (2.5 × 104 for MDA-LM2, 5 × 104 for MDA-MB-231 and 1 × 105 for HCC1806-LM2). Each cohort contained four to five mice, which in NSG background is enough to observe a more than two-fold difference with 90% confidence. Mice were randomly assigned into cohorts. Cancer cell growth was monitored in vivo at the indicated times by retro-orbital injection of 100 µl of 15 mg ml−1 luciferin (PerkinElmer) dissolved in 1× PBS, and then measuring the resulting bioluminescence with an IVIS instrument and Living Image software (PerkinElmer). Mice were killed before the normalized photon flux in the lung region reached 5 × 108; this limit was not exceeded. For systemic T4 treatment, the animals were intraperitoneally injected with the drug (10 mg kg−1) or a vehicle control (95% corn oil and 5% DMSO) for three consecutive days57, starting on the day of cancer cell intravenous injection. For spontaneous metastasis assays, 1 × 105 MDA-MB-231 cells mixed with 25 μl of Matrigel (Corning) were injected into mammary fat pads of NSG mice. Tumour volume was assessed by caliper measurements and resected 4 weeks after cancer cell injection. Tumours were resected before the tumour volume reached 1,000 mm3 (or 1.5 cm in diameter); this limit was not exceeded. Lung colonization was detected using in vivo bioluminescence as described above.

Histology

For gross macroscopic metastatic nodule visualization, mouse lungs (from each cohort) were extracted at the endpoint of each experiment, and 5-μm-thick lung tissue sections were haematoxylin and eosin (H&E) stained. The number of macroscopic nodules was then recorded for each section. An unpaired t-test was used to test for significant variations.

PDXs

Primary tumours of established triple-negative breast cancer PDX models (HCI-001, HCI-002, HCI-010 and STG139) were generated in NSG mice as described before18,19. Original human tissues for generating PDX models were received as de-identified samples, and all subjects provided written informed consent. Medical reports were obtained without personally identifiable information. The UCSF IACUC reviewed and approved all animal experiments. The metastatic potential of the PDXs was determined when primary tumours reached 2.5 cm in diameter20. For histological analysis, the middle and postcaval lobes of the right lung were fixed in 4% PFA overnight and processed for paraffin embedding. Lung sections were stained with H&E using standard protocols and manually analysed for the presence of metastasis using a Leica DMR microscope. For ribosome profiling, the tumours were collected at the size of 1.0 cm diameter and snap frozen immediately. Tumours were stored at −80 °C until further processed as described above.

Computational tools

Ribolog

Unlike differential gene expression analysis using RNA-seq data, which involves comparing two or more count numbers, modelling changes in TE requires comparing ratios between conditions (TE corresponds to a ratio between ribosome protected mRNA footprint and mRNA abundance counts). The main outcome of interest in ribosome profiling, TER, is the ratio of these two TEs. Since the introduction of ribosome profiling, several analytical packages have been developed that largely inherit the assumptions of prior methods originally designed for RNA-seq data analysis58. A closer evaluation of the underlying assumptions used in many of these tools, for example, negative binomial distribution of read counts, revealed that reliable estimation of parameters such as overdispersion required many more biological replicates than are commonly generated in studies of translational control59,60. Moreover, the ratio of two NB variables does not follow any known statistical distribution; therefore, inference on TER using parametric significance tests remains a challenge. We thus sought to devise a new analytical framework for reliable comparison of TEs across conditions with fewer a priori assumptions. The resulting method, which we have named Ribolog, relies on logistic regression to model individual Ribo-seq and RNA-seq reads to estimate logTER (that is, log fold change in TE) and its associated P value across the coding transcriptome.

Before entering the Ribolog pipeline, RNA and RPF fastq files are pre-processed, as described above. Sorted and indexed bam files are imported into Ribolog, and mapped reads are assigned to specific codons using functions borrowed from the R package riboWaltz61.

Stalling bias detection and correction

Suboptimal codons, RNA secondary structures and activation of RBP binding sites may stall translation at certain codons and produce peaks of RPF reads that stand out against the CDS background. If not removed, stalling reads will be counted in with other RPF reads and lead to an overestimation of TE. This may lead to false inferences because stalling reads signify locally obstructed translation and should not be misconstrued as a sign of overall increase in translation rate. We developed a new metric, the Consistent Excess of Loess Predictions (CELP) bias coefficient to measure the strength of stalling bias at each codon. First, we smooth out the observed codon counts along the transcript for each sample using the loess function to produce loess predicted counts. Then, we calculate the CELP bias coefficient and the bias-corrected read count as:

$$b_{ij} = \left( {\mathop {\prod}\limits_K {\frac{{y_{ijk}^l}}{{M_{jk}^l}}} } \right)^{\frac{1}{k}}$$
$$y_{ijk}^{l,c} = \frac{{y_{ijk}^l}}{{b_{ij}}}$$
$$y_{jk}^{l,C} = \mathop {\sum}\limits_i {y_{ijk}^{l,c}}$$

where:

bij: CELP stalling bias coefficient at codon i, gene j

\(y_{ijk}^l\): loess predicted (smoothed) read count at codon i, gene j, sample k

\(M_{jk}^l\): median of non-zero loess-smoothed counts in gene j, sample k

\(y_{ijk}^{l,c}\): bias-corrected read count at codon i, gene j, sample k

\(y_{jk}^{l,C}\): bias-corrected RPF read count of gene j, sample k.

The reasons for using loess-smoothed counts and not raw counts in the above calculations are three-fold: (1) In our experience with multiple ribosome profiling datasets, we have observed that stalling peaks often appear in the same approximate position, but not necessarily the same exact codon, even among replicates of a single biological sample. (2) Some of the factors that impede translation, for example, RBP binding or RNA secondary structures, affect several adjacent codons, not a single codon. (3) Calculation of P-site offset and assignment of RPF reads to specific codons carries a degree of uncertainty, because the distance of read ends from start or stop codon, which is used to estimate P-site offset, is always a distribution, not a single value, even for reads of the same length. It is therefore beneficial to borrow information from neighbouring codons for detection of stalling events. The radius of this neighbourhood—which determines the loess ‘span’ parameter—can be changed by the user (default: 5). Median of loess-smoothed non-zero counts (\(M_{jk}^l\)) represents background CDS translation level, and the ratio \(\frac{{y_{ijk}^l}}{{M_{jk}^l}}\) shows excess or depletion of reads at any codon position compared with the background that is relative peak height. The geometric mean of this ratio among samples produces the bias coefficient for that position. If the goal of the study is to investigate local patterns of stalling between groups of samples, group-specific bias coefficients should be calculated. CELP coefficients or summary statistics derived from them can be then regressed against any position-specific (for example, RBP binding site or codon type), transcript-specific (for example, length or existence of known upstream open reading frames) or group-specific (for example, wild type versus tRNA KD cell line) factors to infer their effects on stalling. On the other hand, if CELP is primarily used to debias RPF counts to allow an unbiased TER test, all samples in the dataset can be pooled together in the calculation of bias coefficients.

TER test

Both RNA and RPF libraries are mapped to the same reference transcriptome as described in the previous section. RNA read counts per transcript are calculated directly from the bam files. RPF read counts are either obtained in the same way or run through CELP debiasing first to smooth out local non-uniformities (described above in detail). RNA and RPF ‘transcript × sample’ count matrices are normalized separately for library size variation using the median-of-ratios method62. We model TE as the odds of retrieving two different sequencing read types from a sample: RPF versus RNA. In this scenario, we hypothetically pool all the reads from an experiment, and then extract a read from this pool. The odds of extracting an RPF versus an RNA read from this pool yields a probabilistic estimation of TE. We compare TER by testing the effect of model covariates on TE, that is, the odds ratio of RPF/RNA between groups or per unit change of continuous predictors. In the very simple case of comparing only two non-replicated samples, a significance test on TER could be performed using a chi-square or Fisher’s exact test on a 2 × 2 contingency table with sample name acting as the exposure (independent variable) and the read type (RPF or RNA) as the response (dependent variable). Since most biological experiments are replicated and involve multiple sample groups, we generalize the test in a logistic regression setting:

$$\log {{{\mathrm{TE}}}} = \log \left( {\frac{{{\mathrm{RPF}}}}{{{\mathrm{RNA}}}}} \right) = \alpha + \mathop {\sum}\limits_i {\beta _iX_i}$$

where:

RPF: normalized (and optionally debiased) RPF read count

RNA: normalized RNA read count

α: intercept

Xi: predictor (independent variable) i

βi: regression coefficient for predictor (independent variable) i.

The test is run separately for each transcript. Independent variables could be categorical, for example, group labels, or continuous to represent a molecular measurement from the sample for example tRNA concentrations or a codon optimality score. This formulation of TER accommodates complex experimental designs with any number of groups or replicates described by any number of attributes (covariates). It can incorporate interaction terms, batch effect indicators or other confounding variables. A P value is reported for each regression coefficient indicating the significance of its effect (‘effect’ here is defined as a regression coefficient being different from 0, or the corresponding TER being different from 1). The effect sizes (logTER) and log10(P values) are plotted together to produce the familiar volcano plot. The expected TER of a transcript between two samples differing in one or multiple attributes can be estimated by substituting the obtained regression coefficients in the equation below:

$${\mathrm{TER}} = \frac{{{\mathrm{TE}}_2}}{{{\mathrm{TE}}_1}} = {{{\mathrm{exp}}}}\left\{ {\mathop {\sum}\limits_i {\beta _i} (X_{i,2} - X_{i,1})} \right\}$$

For detailed instructions to install the package, prepare the input data, run the tests, and interpret and plot the results, visit https://github.com/goodarzilab/Ribolog. Additional modules for quality control, empirical null significance testing to reduce false positives, meta-analysis of ribosome profiling data and so on are also available from the GitHub page. We used the data simulated by the authors of Xtail to benchmark Ribolog against four other commonly used tools (Xtail, Riborex, RiboDiff and Anota2seq). The results are available at https://github.com/goodarzilab/Ribolog/blob/master/benchmarks/ribolog_benchmarks.pdf.

APAlog

The RNA reads used to compare poly(A) site usage could originate from a regular RNA-seq or a specialized 3′ UTR sequencing protocol. In either case, normalized counts of reads mapped to each poly(A) site are used by APAlog to assess the extent and pattern of differential poly(A) site usage via multinomial logistic regression:

$$\log \left( {\frac{{{\mathrm{Alt.site}}}}{{ {\mathrm{Ref.site}}}}} \right) = \alpha + \mathop {\sum}\limits_i {\beta _iX_i}$$

where:

Alt.site: alternative poly(A) site normalized read count

Ref.site: reference poly(A) site normalized read count

α: intercept

Xi: predictor (independent variable) i

βi: regression coefficient for predictor (independent variable) i.

APAlog automatically sets the poly(A) site of each transcript that comes first alphabetically to reference. The user can specify which poly(A) site to serve as reference by adjusting the poly(A) site names in the count matrix. APAlog can be run in three modes: (1) Overall transcript-wise test: a deviance test is performed between the fitted model with covariates and the null (intercept-only) model. This test identifies transcripts that show differential poly(A) site selection among samples but does not specify which poly(A) sites or covariates contribute to the difference. This mode facilitates the quick scanning of a large multi-group dataset to flag putative targets of regulation. Moreover, by performing exactly one test per transcript, it avoids complications of multiple testing correction among transcripts with unequal number of poly(A) sites. (2) Alternatives versus reference test: One poly(A) site per transcript is marked as the reference site, and all others (one or more) are tested against it. This mode is suitable for specific applications such as testing 3′ UTR length variation when one poly(A) site, in this case the most proximal one, can be set to reference and all others compared with it. (3) Pairwise test: this test compares all pairs of poly(A) sites per transcript and provides the highest-resolution view of poly(A) site selection regulation. It is also the best choice if a reference or canonical poly(A) site cannot be logically assigned.

For detailed instructions to install the package, prepare the input data, run the tests and interpret the results, visit https://github.com/goodarzilab/APAlog.

Statistics and reproducibility

For in vivo experiments, mice were distributed into cohorts with five mice per cohort, which in NSG background is enough to observe a more than two-fold difference with 90% confidence. For other experiments, no statistical methods were used to calculate sample size.

No data were excluded from the analyses.

Cell migration/invasion assays were performed in four biological replicates. Co-IP–MS, TMT–MS, RT–qPCR and western blot experiments were performed in biological triplicates. Sequencing-based experiments (Ribo-seq, RNA-seq, CLIP-seq, PAPERCLIP and massively parallel reporter assays (MPRA)) were performed in biological duplicates.

Mice for in vivo experiments were randomly assigned into cohorts. For other experiments, no randomization was performed.

For cell migration/invasion assays, the person counting the colonies was blinded for the experimental conditions. For other experiments, the data were acquired and analysed by the same person.

Data distribution was assumed to be normal, but this was not formally tested.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.