Widespread transcript shortening through alternative polyadenylation in secretory cell differentiation

Most eukaryotic genes produce alternative polyadenylation (APA) isoforms. Here we report that, unlike previously characterized cell lineages, differentiation of syncytiotrophoblast (SCT), a cell type critical for hormone production and secretion during pregnancy, elicits widespread transcript shortening through APA in 3’UTRs and in introns. This global APA change is observed in multiple in vitro trophoblast differentiation models, and in single cells from placentas at different stages of pregnancy. Strikingly, the transcript shortening is unrelated to cell proliferation, a feature previously associated with APA control, but instead accompanies increased secretory functions. We show that 3’UTR shortening leads to transcripts with higher mRNA stability, which augments transcriptional activation, especially for genes involved in secretion. Moreover, this mechanism, named secretion-coupled APA (SCAP), is also executed in B cell differentiation to plasma cells. Together, our data indicate that SCAP tailors the transcriptome during formation of secretory cells, boosting their protein production and secretion capacity.

T he placenta functions as the lung and kidney of the fetus and produces hormones to maintain the pregnancy 1,2 . Derived from the trophectoderm layer of blastocyst, trophoblasts (TBs) are the main cell types that form the outer layer of placenta 3 . Among the three TB subtypes, proliferating villous cytotrophoblasts (VCTs) form the germative layer that gives rise to the other two types, extravillous trophoblasts (EVTs) and syncytiotrophoblasts (SCTs) 3 . EVTs invade the decidua to anchor the placenta to uterus and modify maternal vasculature 1 , whereas SCTs form the interface between the mother and fetus, where nutrient/waste exchange takes place 1 , and secrete a variety of hormones necessary for pregnancy, such as human chorionic gonadotropin (hCG), human placental lactogen, human placental growth hormone, progesterone, and estradiol 4 . While transcriptional regulation in TB differentiation has been extensively studied [5][6][7] , little is known about post-transcriptional control of gene expression in the process.
Almost all eukaryotic messenger RNAs (mRNAs) require cleavage and polyadenylation for 3ʹ end maturation 8,9 . Over 70% of mammalian genes harbor multiple polyadenylation sites (PASs), leading to expression of alternative polyadenylation (APA) isoforms 10,11 . Most APA sites are located in the 3ʹ most exon, resulting in isoforms with different 3ʹUTR sizes 11 . 3ʹUTR APA can play a substantial role in gene expression through regulation of sequence or structure motifs that impact aspects of mRNA metabolism, such as translation, stability and localization [12][13][14] . In addition, about 20% of human genes have APA sites located in introns, which additionally change coding sequences when used 11,15,16 .
The relative abundance of an APA isoform can vary widely across tissue types 17 . For example, brain tissues tend to express long 3ʹUTR isoforms, whereas blood and testis show the opposite trend [16][17][18][19] . In addition, increased cell proliferation rate has been associated with shortening of 3ʹUTRs through APA 20 , while 3ʹUTRs lengthen during cell differentiation and development 21,22 . Here, we report the unexpected finding of widespread transcript shortening through APA, including 3ʹUTR shortening and activation of intronic polyadenylation (IPA), during differentiation of SCTs. We examine several in vitro models as well as single placental cells in vivo. We further generalize this finding to B-cell differentiation.

Results
Trophoblast differentiation displays a distinct APA profile. To study APA regulation in different cell differentiation lineages, we analyzed RNA-seq data previously generated by Xie et al. 23 , involving differentiation of human embryonic stem cells (hESCs, H1 line) into neuronal projector cells, mesenchymal stem cells, mesendoderm, and TBs (illustrated in Fig. 1a) 23 . By comparing RNA-seq reads covering different portions of 3ʹUTR that were subject to APA regulation (illustrated in Fig. 1b and see "Methods" for details), we found that, consistent with previous reports 24,25 , differentiation of hESCs into neuronal projector cells, mesenchymal stem cells, or mesendoderm led to 3ʹUTR lengthening to variable degrees (Fig. 1c). By contrast, surprisingly, differentiation of hESCs into TBs showed 3ʹUTR shortening (Fig. 1c).  23 ). b Schematic of 3ʹUTR APA analysis using RNA-seq data. RNA-seq reads in 3ʹUTRs are divided into constitutive UTR (cUTR) and alternative UTR (aUTR) groups based on the first and last PASs, as indicated. 3ʹUTR length change for a gene is indicated by relative expression difference (RED) value between two samples or sample groups, as shown in the graph. RED is calculated as indicated. c Median REDs of four differentiation lineages as compared to hESC. A total of 8,712 genes were used for analysis. d Boxplots showing expression changes of proliferation genes (left; 350 genes; defined by Sandberg et al. 20 ) and development genes (right; 74 genes; defined by Ji et al. 21 ) in TB vs. hESC. P-values (Wilcoxon test) based on comparisons with other genes are indicated.
Increased cell proliferation could trigger 3ʹUTR shortening 20 . Interestingly, the proliferation genes that were previously found to be negatively correlated with 3ʹUTR size 20 were in fact globally downregulated during TB differentiation (Fig. 1d, left). Note that the same genes did indicate decreased cell proliferation in differentiation of mouse myoblast cell line C2C12 (Supplementary Fig. 1a), a model we have used to study 3ʹUTR lengthening in cell differentiation ( Supplementary Fig. 1b) 21,26 . We previously identified a gene set whose expression levels positively correlated with 3ʹUTR size in mouse embryonic development 21 . While expression of this development gene set significantly increased in C2C12 differentiation ( Supplementary Fig. 1a), they were also upregulated in hESC to TB differentiation (Fig. 1d, right), again indicating that 3ʹUTR size regulation in TB differentiation is different than previously studied differentiation and development systems.
Widespread transcript shortening in TB differentiation. To examine APA in TB differentiation in greater detail, we differentiated hESCs (H9 line) to TBs through the BMP4 induction protocol 27 , and subjected cellular RNA to 3ʹ Region Extraction and Deep Sequencing (3ʹREADS), a deep-sequencing method we previously developed to specifically interrogate 3ʹ ends of poly(A) + RNAs 11 (Fig. 2a). Based on the top two most abundant 3ʹUTR APA isoforms of each gene (Fig. 2b), we found that genes showing increased relative expression of short 3ʹUTR isoform greatly outnumbered those showing increased relative expression of long 3ʹUTR isoform by 11.4-fold in TBs vs. hESCs (1070 vs. 94, Fig. 2c), indicating a substantial and global trend of 3ʹUTR shortening. Based on the size of the 3ʹUTR sequence that is different between the two isoforms, named alternative UTR or aUTR (illustrated in Fig. 2b), we divided genes into five bins (Fig. 2d). Using relative expression difference (RED) to represent the degree of 3ʹUTR APA regulation (described in Fig. 2b), we found that genes tended to display a greater extent of 3ʹUTR shortening as aUTR size increased (Fig. 2d). For example, genes with an aUTR size >1836 nucleotides (nt, bin 5) showed much more conspicuous 3ʹUTR shortening than genes with an aUTR size <152 nt (bin 1, P = 3.4 × 10 −37 , Wilcoxon test, Fig. 2d).
A sizable fraction of APA sites are located in introns 11 , leading to IPA isoforms that have coding sequence changes in addition to 3ʹUTR alternations (illustrated in Fig. 2e). Using isoforms containing PASs in the 3ʹ terminal exons as reference, we found that IPA isoforms were generally upregulated in TBs vs. hESCs, with 1.7-fold more genes showing IPA activation than those showing suppression (172 vs. 100, Fig. 2f). Gene Ontology (GO) analysis indicated that the most significant biological processes associated with genes showing 3ʹUTR shortening appeared to be related to protein and RNA metabolism (Table 1), such as "regulation of catabolic process", "post-transcriptional gene regulation", and "protein localization to organelle". For genes showing IPA activation (Table 2), "regulation of mRNA catabolic process," "cellular response to acid chemical", and "cytoplasmic translational initiation" were the top enriched biological processes. Altogether, the 3ʹREADS result corroborated our initial finding of global 3ʹUTR shortening in TB differentiation, and additionally revealed IPA activation. These APA events could potentially impact a diverse array of biological processes.
APA regulation correlates with syncytiotrophoblast formation. As there are three TB subtypes, namely, VCT, EVT, and SCT, we next asked which TB subtype was responsible for APA regulation when differentiated from hESCs. Using gene expression datasets from four TB subtype studies 28-31 , we constructed a panel of 69 genes as biomarkers for three TB subtypes (Fig. 3a, and "Methods" for detail), with each subtype having one third of the genes (see Supplementary Table 1 for the full list). Using these marker genes, we found that differentiation of hESCs to TBs by BMP4 treatment was biased toward the SCT lineage ( Fig. 3b), suggesting that SCT differentiation might be chiefly responsible for the observed APA regulation. A previous study by Yabe et al. 32 separated hESC-derived TBs (induced by BMP4/A83-01/PD173073 or BAP) into three populations based on cell size, namely, < 40 μm, between 40 μm and 70 μm, and >70 μm (illustrated in Fig. 3c). As such, larger cells were more likely to be syncytialized SCTs than smaller cells. Indeed, using the RNA-seq data from the study and our TB subtype marker genes, we confirmed this notion ( Supplementary  Fig. 2a) and, importantly, found that 3ʹUTR sizes decreased as cell size increased (Fig. 3d). The largest cells (>70 μm) displayed significantly shorter 3ʹUTRs than did the smallest cells (< 40 μm) (P < 0.05, t-test, Fig. 3d), confirming that differentiation of SCTs elicits 3ʹUTR shortening. Note that while the three hESC to TB differentiation datasets used here involved different hESC lines and induction protocols, leading to differences in gene expression changes ( Supplementary Fig. 2b), all three converged on global 3ʹUTR shortening, highlighting the robustness of our finding.
In addition to the hESC differentiation model, human choriocarcinoma cell line BeWo is a well-established system for SCT formation 33,34 . Using RNA-seq data generated by Azar et al. 34 , in which BeWo cells were induced to fuse by forskolin (Fsk)(illustrated in Fig. 3e), we found that, as expected, SCT marker genes were significantly upregulated after two days of Fsk treatment ( Supplementary Fig. 3a). Importantly, 3ʹUTRs significantly shortened at the same time point (Fig. 3f). The extent of 3ʹUTR shortening in the BeWo model appeared similar to that in the hESC model, as indicated by their median 3ʹUTR APA REDs (-0.15 vs. -0.17, Fig. 3d, f) and RED distributions ( Supplementary Fig. 3b). In addition, the RNA-seq data-derived 3ʹUTR APA REDs from the BeWo model and the hESC model were much better correlated for the genes that showed 3ʹUTR shortening in the 3ʹREADS data (r = 0.40, blue dots in Fig. 3g) than for other genes containing 3ʹUTR APA sites (r = 0.26, gray dots in Fig. 3g), further indicating consistency between the two SCT formation models. Both 3ʹREADS and RNA-seq data for an example gene DNAJC3 (encoding DNAJ heat shock protein family member C3) are shown in Fig. 3h, where REDs and Pvalues highlight its consistent and significant 3ʹUTR shortening in all the models.
We also examined a mouse model of TB differentiation, in which ectopic expression of a constitutively active Hras mutant Hras Q61L in mouse ESCs led to formation of syncytial giant cells 35 . Using 3ʹREADS (three biological replicates, Supplementary Fig. 4a), we found that Hras Q61L expression in mouse ESCs elicited both global 3ʹUTR shortening (a 8.3-fold bias in gene number between shortened and lengthened genes, Supplementary  Fig. 4b) and IPA activation (a 20.9-fold bias in gene number, Supplementary Fig. 4c). Note that while the mESC model did not involve upregulation of human SCT subtype marker genes ( Supplementary Fig. 4d) or development genes ( Supplementary  Fig. 4e), cell proliferation genes were slightly downregulated ( Supplementary Fig. 4e). These results indicate that despite many differences between human and mouse TB models, they both display global 3ʹUTR shortening and IPA activation.
Single-cell analysis in vivo corroborates in vitro findings. Several recent studies have generated single-cell RNA-seq (scRNAseq) data from the placenta 29,30,36 , creating opportunities to interrogate APA in TBs in vivo. To address read paucity in singlecell data, which could lead to high noise levels for APA analysis 25 , we examined APA in different cell types using aggregated scRNAseq data. This method, named single-cell significance analysis of APA (scSAAP, illustrated in Fig. 4a and see Methods for detail), first clustered cells based on their gene expression profiles; TB subtypes were identified using the TB subtype marker gene panel; RNA-seq reads from all cells of the same type were then combined for 3ʹUTR APA analysis.
Applying scSAAP to three datasets corresponding to placental cells isolated at different stages of pregnancy, including the first 30,36 and third 29 trimesters, we found that SCTs consistently displayed shorter 3ʹUTRs than VCTs and EVTs (P < 0.05, t-test, Fig. 4b). Importantly, the genes showing 3ʹUTR shortening in the hESC model (based on 3ʹREADS data, Fig. 2c) displayed greater 3ʹUTR shortening in vivo (SCTs vs. VCTs) than other genes with 3ʹUTR APA sites (P = 4.0 × 10 −11 , Kolmogorov-Smirnov (K-S) test, Fig. 4c). The scRNA-seq data for DNAJC3 is shown in Fig. 4d, which matched well with bulk RNA-seq and 3ʹREADS data from in vitro models (Fig. 3h).
The single-cell transcriptome data could also be used to decipher relationships between cells at different differentiation  stages 37 . Using TB subtype marker genes, we divided cells of each TB subtype into two portions, "near" and "far", based on distance to the converged point of all subtypes (Fig. 4e). As such, cells in the far group of each subtype had higher expression levels of the corresponding marker genes, and thus could be considered more differentiated as compared to those in the near group. Interestingly, using the first trimester placental cell data from Vento-Tormo et al. 30 , we found that the far group in the SCT lineage had significantly shorter 3ʹUTRs as compared to the near group (P < 2.2 × 10 −16 , Wilcoxon test, Fig. 4f). By contrast, the far group of VCT lineage displayed longer 3ʹUTRs than its near group (P < 2.2 × 10 −16 , Wilcoxon test, Fig. 4f). Since the path from VCT far cells to SCT far cells could represent the full SCT differentiation lineage in placenta, this result supports the notion that SCT differentiation in vivo, like in in vitro models, involves progressive 3ʹUTR shortening. This notion is also supported by the third trimester placental cell data by Tsang et al. 29 ( Supplementary Fig. 5a), in which SCT far cells showed shortest 3ʹUTRs among all cell groups (Supplementary Fig. 5b and see Supplementary Fig. 5c for DNAJC3 example). Interestingly, the difference in 3ʹUTR size between VCT near and far cells was distinct from that in first trimester samples (Fig. 4f vs.  Supplementary Fig. 5b). Altogether, single-cell transcriptomic analysis of placental cells confirmed short 3ʹUTR expression in SCTs in vivo.
APA changes are coupled to secretion gene expression. While our analyses indicated 3ʹUTR shortening during SCT differentiation both in vitro and in vivo, intriguingly, we did not observe significant 3ʹUTR size changes during in vitro syncytialization of primary human trophoblast (PHT) cells, based on studies by Azar et al. 34 and Yabe et al. 32 (Fig. 5a). This is despite the fact that these cells underwent successful cell fusion 32,34 and upregulation of SCT marker genes ( Supplementary Fig. 6), suggesting that syncytialization process per se could be uncoupled  29 , four replicates in Vento-Tormo et al. 30 , and 20 randomly sampled data in Liu et al. 36  from APA regulation. This prompted us to explore other biological processes that might accompany APA changes in SCT differentiation. We found that, while similar in APA regulation, the hESC and BeWo models had substantial differences in gene expression changes, as indicated by a modest correlation coefficient (r = 0.26, Pearson Correlation, Fig. 5b). We thus reasoned that consistently regulated genes between these two divergent models might provide clues about some common underlying biological processes related to APA regulation. To this end, we identified 1071 upregulated genes and 521 downregulated genes common to both models (fold change > 1.2 and false discovery rate = 0.05, DESeq, Fig. 5b). GO analysis found several biological processes that were significantly enriched for upregulated genes (Fig. 5c). Notably, the top three biological processes appeared to be substantially more significant than others (all with P-value < 1.0 × 10 −9 , Fisher's exact test, Fig. 5c), including "response to endoplasmic reticulum stress", "Golgi vesicle transport", and "endomembrane system organization". By contrast, Biological processes enriched for commonly downregulated genes Significance of regulation is based on fold change > 1.2 and FDR = 0.05 (DESeq) in both models. The hESC model data correspond to > 70 μm TB cells vs. hESCs (Fig. 3c) and the BeWo model data correspond to Fsk + vs. Fsk- (Fig. 3e). Commonly regulated genes are highlighted. Red for upregulation and blue for downregulation. Pearson correlation coefficient (r) comparing gene regulation in the two models is indicated. c Bubble plot showing top GO terms (Biological Processes only) enriched for genes upregulated in both hESC and BeWo models. Each GO term is represented by a circle, whose size reflects the number of genes associated with the term, and also by a dot, whose size reflects the number of upregulated genes associated with the term. All sizes are relative to other circles in the plot. For example, the term "response to endoplasmic reticulum stress" is associated with 273 genes, of which 62 are upregulated. Definition of USPS (3ʹUTR size-related protein secretion) genes (157 in total) is indicated. d CDF curves for USPS gene expression changes in four datasets, including two in vitro syncytialization models shown in a and two single-cell datasets (SCT far vs. VCT far, Fig. 4) were much less significant based on P-values ( Supplementary  Fig. 7). We next isolated 157 genes that were commonly upregulated in both hESC and BeWo models and were also associated with the top three GO processes. As both models involved 3ʹUTR shortening and the top three processes were all related to expression or transport of secreted proteins 38,39 , we named these genes 3ʹUTR Size-related Protein Secretion (USPS) genes (listed in Supplementary Table 3). Supporting the connection between 3ʹUTR shortening and protein secretion, we found that USPS genes were significantly upregulated in placental SCT far cells vs. VCT far cells (both first trimester and third trimester data), but were not so during in vitro syncytialization of PHT cells (P < 2.2 × 10 −16 , K-S test, Fig. 5d). In fact, in the Azar et al. 34 data, which showed a mild 3ʹUTR lengthening (Fig. 5a), USPS genes were downregulated (Fig. 5d). Moreover, USPS gene expression levels were well correlated negatively with 3ʹUTR APA REDs in both the hESC model (Fig. 5e) and the BeWo cell model (Fig. 5f). Taken together, these data indicate that upregulation of genes involved in protein secretion functions, not cell syncytialization, is associated with 3ʹUTR shortening in SCT differentiation. We thus named this mechanism secretion-coupled APA (SCAP).
3ʹUTR shortening is associated with increased RNA abundance. We next wanted to understand the consequences of SCAP in SCT formation. We found that in both hESC and BeWo cell models, as well as in SCT far cells vs. VCT far cells in vivo, upregulated genes tended to show greater extent of 3ʹUTR shortening than genes without expression changes (P < 1 × 10 −12 , K-S test, Fig. 6a). Consistently, USPS genes in general displayed significant 3ʹUTR shortening in these conditions ( Supplementary Fig. 8). On the other hand, downregulated genes displayed opposite trends (P < 1 × 10 −13 , K-S test, Fig. 6a). Note that we used RNA-seq reads in coding regions for gene expression analysis and those in 3ʹUTRs for APA analysis, avoiding confounding calculation (i.e., the same reads used for both expression and APA analyses). This result indicates that 3ʹUTR size regulation is related to gene expression changes. Consistently, the genes that showed shortened 3ʹUTRs in TB differentiation as defined by the 3ʹREADS data (Fig. 2c) were significantly upregulated in both the hESC and BeWo cell models, as well as in SCT far cells vs. VCT far cells (P < 1 × 10 −3 , K-S test, Fig. 6b).
One way 3ʹUTRs could impact gene expression is through control of mRNA stability 40 . To explore this, we examined RNA stability of APA isoforms in BeWo cells. We metabolically labeled RNA with 4-thiouridine (4sU) for 1 h and compared the abundance of labeled RNAs, which represented newly made RNAs, with that of total cellular RNAs, which represented steady state RNAs (illustrated in Fig. 6c). As such, the ratio of abundance (steady state vs. newly made), named Stability Score, reflected transcript stability (Fig. 6c).
We found that long 3ʹUTR isoforms were generally less stable than short 3ʹUTR isoforms in BeWo cells by 9.4-fold (262 genes showing a long 3ʹUTR isoform being significantly less stable than a short isoform, and 28 genes showing the opposite trend, FDR = 0.05, DESeq, Fig. 6d). Importantly, as aUTR size increased, the stability difference between short and long 3ʹUTR isoforms became more evident (Fig. 6e). For example, for genes with an aUTR size >1836 nt (bin 5 in Fig. 6e), the median Stability Score difference was −0.61, whereas the difference was −0.16 for genes with an aUTR size <152 nt (P < 2.2 × 10 −16 between these two groups test, Wilcoxon test, Fig. 6e). For DNAJC3, a USPS gene in bin 5, its long 3ʹUTR isoform was substantially less stable than its short 3ʹUTR isoform (Stability Scores = -1.7 vs. 1.5, Fig. 6f).
In contrast to 3ʹUTR shortening genes, the genes that displayed IPA activation showed only modest differences in gene expression as compared to all genes ( Supplementary Fig. 9a). In addition, IPA isoforms were generally less stable than isoforms whose PASs were in the 3ʹ terminal exon (Supplementary Fig. 9b).
Using tetramer enrichment analysis, we found that several Urich motifs were enriched in the aUTRs that were removed by APA in SCT differentiation ( Supplementary Fig. 10a). Interestingly, these motifs were also associated with 3ʹUTRs of unstable transcripts in BeWo cells (Supplementary Fig. 10b). A general negative correlation could be discerned between tetramer enrichments in regulated aUTRs and in 3ʹUTRs for stability (Fig. 6g), indicating that removal of destabilizing motifs in aUTRs can potentially lead to increased expression for genes displaying 3ʹUTR shortening in SCT differentiation.
SCAP in differentiation of B cells to plasma cells. We next asked whether SCAP took place in other systems besides SCT differentiation. Differentiation of B cells to plasma cells was the first biological condition in which APA regulation was discovered 41,42 . Increasing protein secretion capacity from B cells to plasma cells is crucial for antibody secretion. We analyzed an RNA-seq dataset generated by Shi et al. 43 , which included four types of B cells, namely, B1 cells, marginal zone B cells, germinal center (GC) B cells, and follicular B cells, as well as three types of plasma cells, namely, spleen plasmablasts (PBs), spleen plasma cells, and bone marrow plasma cells. Among these cell types, spleen PBs and marginal zone B cells displayed the shortest and longest 3ʹUTRs, respectively, based on 3ʹUTR APA REDs (Fig. 7a, y-axis). Significantly, 3ʹUTR sizes correlated well with USPS gene expression across these cell types (r = −0.82, Pearson correlation, Fig. 7a). By contrast, proliferation gene expression levels were not correlated with 3ʹUTR sizes (r = -0.05, Pearson correlation, Fig. 7b).
The difference between GC B cells and spleen PBs is quite striking, with the former having the highest proliferation gene expression levels (Fig. 7b, x-axis) and the latter having the highest USPS gene expression levels (Fig. 7a, axis). Importantly, consistent with the SCAP model, GC B cells had longer 3ʹUTRs than PBs. To corroborate this finding, we analyzed a dataset by Ise et al. 44 , in which different types of GC B cells were isolated, including plasma cell fate B cells in the light zone, recycling cell fate B cells in light zone, and dark zone B cells. In addition, GCderived PBs were used as a control. In agreement with the data by Shi et al. 43 , PBs had shorter 3ʹUTRs than all GC B cells (Fig. 7c). Importantly, compared to dark zone B cells, which had the longest 3ʹUTRs among the three GC B-cell types (Fig. 7c), PBs had significantly higher expression of USPS genes and lower expression of cell proliferation genes (P = 2.2 × 10 −12 and = 1.3 × 10 −6 , respectively, K-S test, Fig. 7d). This result confirms that USPS gene expression is a good indicator of 3ʹUTR size in Bcell differentiation to plasma cells.
We next took an exploratory approach to identify genes whose expression changes were related to 3ʹUTR size differences across the B cells and plasma cells analyzed in the Shi et al. 43 study. Interestingly, using Pearson Correlation analysis (example cases shown in Supplementary Fig. 11a), we found an enrichment of genes with negative correlation coefficients (note the uneven data distribution in Supplementary Fig. 11b). GO analysis of the genes with the most negative correlation coefficients (r < -0.79, bottom 5% of all genes) revealed that these genes were highly enriched with functions in protein secretion, including "negative regulation of response to endoplasmic reticulum stress" and "protein exit from endoplasmic reticulum" (Supplementary Fig. 11c), further supporting the SCAP mechanism in B-cell differentiation.
Perturbation of cell secretion further confirms SCAP. We next wondered whether perturbation of cellular secretion capacity could lead to 3ʹUTR size changes. To this end, we analyzed an RNA-seq dataset involving overexpression (OE) of active forms of XBP1 (spliced form) and Creb3l2 (cleaved form) in the mouse pituitary tumor cell line AtT-20 45 . OEs of these factors individually or jointly led to enhancement of secretory capacity to various degrees 45 . As expected, Creb3l2 OE, which was reported to control expression of translation factors 45  P-values (K-S test) for significance of difference between red or blue genes and black genes are indicated. b CDF curves of gene expression changes for genes showing 3ʹUTR shortening in the 3ʹREADS data (Fig. 2c, blue line) and all genes (black line) in three datasets (same as in a). P-values are based on the K-S test comparing blue and black genes. c Schematic of mRNA stability analysis of BeWo cells. mRNA stability of each transcript is represented by a Stability Score, as indicated. d Scatter plot comparing Stability Score of short 3ʹUTR isoform (x-axis) with that of long 3ʹUTR isoform (y-axis). Genes whose isoforms have significantly different Stability Scores (P < 0.05, DEXSeq) are highlighted in color. e Relationship between aUTR size and Stability Score difference between long and short isoforms. Genes are grouped into five similarly sized bins (about 550 genes each) based on their aUTR length. P-value (Wilcoxon test) comparing Bins 1 and 5 is indicated. Data are presented as median value ± SEM. f UCSC Genome Browser tracks showing 3ʹREADS data for DNAJC3 in total and 4sU-labeled RNA samples. g Tetramer enrichment scores for aUTRs of genes with shortened 3ʹUTRs (Fig. 2c) vs. aUTRs of other genes (x-axis) are compared to enrichment scores for 3ʹUTRs of stable vs. unstable transcripts (y-axis). Fig. 7e, left). By contrast, XBP1 OE, which upregulates endoplasmic reticulum (ER) biogenesis 45 , led to significant upregulation of USPS genes (median log 2 Ratio = 0.15, Fig. 7e, middle). Interestingly, co-OE of both factors led to the greatest upregulation of USPS genes (median log 2 Ratio = 0.22, Fig. 7e, right), indicating synergy between these two factors. Consistent with the SCAP model, 3ʹUTR shortening was significant in cells with XBP1 OE and co-OE of XBP1 and Creb3l2 (Fig. 7f), with the latter being greater in magnitude than the former. By contrast, Creb3l2 OE did not elicit significant 3ʹUTR shortening (Fig. 7f). Notable also is that proliferation genes were slightly downregulated in all OE samples (orange lines in Fig. 7e), arguing against the possibility that the observed 3ʹUTR shortening was due to increased cell proliferation. Therefore, forced enhancement of secretion capacity through XBP1 or XBP1 + Creb3l2 OEs could also elicit 3ʹUTR shortening, lending a strong support to the SCAP mechanism.

Discussion
In this study, we reveal secretion-coupled APA, or SCAP (illustrated in Fig. 8), involving global 3ʹUTR shortening and IPA activation in SCT differentiation. This mechanism is also in play during differentiation of B cells to plasma cells, indicating its generality in secretory cell formation. We further show that 3ʹUTR shortening is coupled with upregulation of gene expression, due likely to prolonged transcript half-life through removal of destabilizing sequences in 3ʹUTRs. We thus advocate that SCAP is an integral part of gene expression program in secretory cells, tailoring their transcriptome for enhanced protein production and secretion. We show that SCAP is different from cell proliferation-elicited APA, which has been implicated in 3ʹUTR shortening in T-cell activation 20 and cancer cell development 46 , or developmentassociated APA, which is executed in embryonic development and many cell differentiation lineages 21 . These conclusions are based largely on expression patterns of USPS genes, proliferation genes, and development genes. However, during the course of this study, we also noticed that gene expression levels of USPS genes and proliferation genes could be correlated in some systems. For example, in T-cell activation, where there is global 3ʹUTR shortening 20 (Supplementary Fig. 12a), both USPS genes and proliferation genes are activated (Supplementary Fig. 12b). A similar observation was made using data from a set of NCI-60 human tumor cell lines 47 (Supplementary Figs. 12c, d). Therefore, in most cells, secretion capacity (represented by USPS genes) appears to be coupled with cell proliferation. An open question that requires further investigation, therefore, is whether it is cell proliferation or protein secretion that is the primary driver for APA regulation in these non-secretory cells.
While 3ʹUTR shortening is widespread in secretory cell differentiation, it appears to impact secretion genes to a greater extent than other genes, such as those involved in ER stress response, ER organization, and secretory pathways. One ramification of SCAP, therefore, is amplification of transcriptional activation of secretion genes through stabilization of transcripts, leading to rapid accumulation of their RNAs in secretory cell differentiation. A case in point is DNAJC3, whose protein product (also known as P58 IPK ) plays critical roles in mitigating ER stress, including prevention of translational inhibition caused by stressinduced eIF2α phosphorylation 48 and function as a co-chaperone to protect stressed ER 49 . Mutations of DNAJC3 have been implicated in diabetes and neurodegeneration 50 . Here, we show significant 3ʹUTR shortening of DNAJC3 in SCT differentiation, and its two 3ʹUTR isoforms are conspicuously different in size and stability. It is thus conceivable that APA could augment DNAJC3 protein expression to preemptively address potential ER stress when differentiating secretory cells ramp up their secretion capacity.
Another possible impact of 3ʹUTR shortening is alternation of mRNA localization, such as the association with ER 51 or other organelles 52 . This may lead to modulation of localized translation. The ER interaction may deserve special attention, given our observation that overexpression of XBP1, an activator of ER biogenesis, is sufficient to cause global 3ʹUTR shortening in the pituitary tumor cell line AtT-20. In this vein, it would be worthwhile to test how perturbation of APA might alter ER metabolism in secretory cells.
It is also worth noting that the genes encoding secreted proteins do not necessarily display APA. For example, many hormone-encoding genes, such as hCG genes (the major hormones secreted by SCTs), contain only one PAS in their 3ʹUTR (Supplementary Table 4). Therefore, 3ʹUTR shortening in general functions to facilitate the growth of secretory capacity in the cell rather than to help the production of individual secreted proteins.
Our result also suggests that some genes may employ APA to express protein isoforms through intronic PAS usage, which is globally activated during SCT differentiation. Notably, intronic PAS activation of the immunoglobulin M heavy chain gene in Bcell differentiation is one of the first reported APA events 41,42 . This regulation switches expression of a membrane-associated protein isoform in B cells to a secreted isoform in plasma cells. How widespread is protein isoform switch in SCT differentiation and in other secretory systems needs to be further delineated.
Global 3ʹUTR shortening could be caused by activated usage of proximal PASs during pre-mRNA processing (processing-driven) or by enhanced degradation of long 3ʹUTR isoforms (degradation-driven). We think the former is more likely in SCT differentiation because of two reasons. First, genes that display shortened 3ʹUTRs tend to have increased RNA abundance, making the processing-driven model more plausible. If the degradation-driven mechanism was the underlying cause, the RNA abundance increase could only be explained by a nonparsimonious scenario where enhanced RNA degradation is countered by transcriptional activation. Second, more importantly, we observed increased expression of IPA isoforms during SCT differentiation, which is in line with the notion that 3ʹ end processing activity is activated. Note that IPA isoforms are typically less stable than isoforms using 3ʹ terminal exon PASs, further lending a support to the processing-driven model. A growing number of factors have been implicated in global regulation of APA 12 , including core cleavage and polyadenylation factors 53 , U1 snRNP 54 , transcriptional elongation 55 , nuclear export 56 , etc. How these mechanisms are in play in secretory cell differentiation is an open question. Notably, upregulation of CstF64, one of the core 3ʹ end processing factors, was previously found to be regulate APA in B-cell differentiation 57 . Whether different secretory cells share the same APA regulatory mechanism needs to be explored in the future.

Methods
Cell culture. Human embryonic stem cells (hESCs) were differentiated into the trophectoderm lineage by using the BMP4 method 27 . BeWo choriocarcinoma cell line was a gift from the Sergei Kotenko lab at Rutgers University. BeWo cells were cultured in Dulbecco's modified Eagle's medium/F12 (DMEM/12) supplemented with 10% fetal bovine serum (FBS). Syncytialization of BeWo cells was induced by treating cells with 50 μM Forskolin (Fsk, Sigma-Aldrich) for 72 h. Mouse embryonic stem cells with doxycycline-inducible Hras mutant Hras Q61L were cultured on a tissue culture dish precoated with 0.5% gelatin in media containing leukemia inhibitory factor (LIF) 35 . At the time of induction with doxycycline (24 h after LIF removal), culture media was replaced with Dulbecco's modified Eagle's medium/F12 (DMEM/F12) containing 10% FBS. All culture media also contained 100 IU/ml penicillin and 100 μg/ml streptomycin.
Real-time quantitative PCR (RT-qPCR). Total RNA was extracted by using TRIzol (Thermo Fisher) and treated with Turbo DNase (Thermo Fisher). Complementary DNA (cDNA) was synthesized by using oligo(dT) and M-MLV reverse transcriptase with 1.5-2 µg of RNA. cDNA was then mixed with real-time PCR primers and Luna qPCR master mix (NEB). qPCR was run on an Applied Biosciences StepOne Plus Real-Time PCR system.
Isolation of newly made RNAs. BeWo cells were cultured in medium supplemented with 50 μM of 4-thiouridine (4sU; Sigma) for 1 h before harvest. Total RNA (steady state) was extracted by using TRIzol. Newly made (4sU-labeled) RNAs were fractionated following the protocol described in ref. 58 . Briefly, 100 μg of total RNA was biotinylated by using biotin-HPDP (1 μg/μl in Dimethylformamide; Thermo Fisher Scientific), and then extracted with chloroform three times and precipitated with ethanol. The biotinylated RNA was captured by Streptavidin C1 Dynabeads. After washing for six times, biotinylated RNA was eluted by DTT. Both newly made and steady state RNAs were precipitated with ethanol and then used for 3ʹREADS analysis.
3ʹREADS and data processing. The 3ʹREADS or its newer version 3ʹREADS + procedures were performed to sequence the 3ʹ end fragment of poly(A) + RNAs 11,59 . Libraries were sequenced on an Illumina GAIIx (1 × 72 nt) or HiSeq (1 × 150 nt) platform. 3ʹREADS/3ʹREADS + data were then processed to identify polyadenylation site-containing reads, or PAS reads 59 . Briefly, the sequence corresponding to 5ʹ adaptor was first removed from raw reads by using Cutadapt 60 . Reads with short inserts (< 23 nucleotides) were discarded. The remaining reads were then mapped to the genome (hg19 for human or mm9 for mouse) by using bowtie2 (local mode) 61 . The 5ʹ random nucleotides derived from the 3ʹ adaptor were removed before mapping. Reads with a mapping quality score (MAPQ) ≥ 10 were kept for further analysis. Reads with ≥2 non-genomic 5ʹ-Ts after alignment were called PAS reads. PASs within 24 nucleotides from each other were clustered 11 . The PAS read counts of genes were normalized by the median ratio method in the DESeq program 62 .
Analysis of 3ʹREADS data. For 3ʹUTR APA analysis, the two most abundant APA isoforms (based on PAS reads) whose PASs are in the 3ʹUTR of the last exon were selected. They were named proximal PAS (pPAS) and distal PAS (dPAS) isoforms. Significant APA events were those with relative abundance change > 5% and Pvalue < 0.05 (Fisher's exact test or DEXSeq analysis) between samples. Relative Expression Difference (RED) was calculated as the difference in log2(RPM ratio) of the two APA isoforms between two samples. The aUTR size was the distance between the two APA sites in the 3ʹUTR. For Intronic APA analysis, both PAS reads in intronic regions (IPA isoforms) and PAS reads in the last exon (TPA isoforms) were respectively summed and compared. Significant APA events were those with relative abundance change > 5% and P-value < 0.05 (Fisher's exact test or DEXSeq analysis) between samples.
Analysis of APA using RNA-seq data. 3ʹUTR APA analysis using RNA-seq data were carried out by the SAAP-RS method 25 . Briefly, reads were aligned to the human genome with STAR v2.5.2 63 by using default settings. Raw bam files were further processed by using the R packages RSamtools (for processing bam files), GenomicAlignments (for counting reads), and GenomicFeatures (for defining genomic regions). Human PAS locations were obtained from PolyA_DB3 64 . Relative expression difference (RED) was calculated as difference in log2(aUTR read number/cUTR read number) between two samples. Significance of difference was analyzed by either DEXSeq when there were replicates or the Fisher's exact test when there were no replicates. When there were no replicates, standard deviation was obtained by sampling data with a bootstrapping method for 20 times 53 .
Gene sets used in this study. For the TB subtype marker gene panel, four trophoblast studies were used [28][29][30][31] . Each marker gene for a specific TB subtype was reported by at least two studies and had no conflict between studies. Proliferation gene set was based on the study by Sandberg et al. 20 . Development gene set contained the genes whose expression levels positively correlated with 3ʹUTR size in mouse embryonic development, as defined in the study by Ji et al. 21 (called PCS in that study). The USPS gene set defined in this study contained genes whose expression levels increased in both hESC and BeWo models and were associated with the top three GO terms for upregulated genes in the two models.
Motif analysis. The frequencies of tetramers in specific regions were calculated and compared between different gene sets. Motif enrichment scores were calculated by -log 10 (P-values), where P-values were based on the Fisher's exact test.
Gene expression analysis using single-cell RNA-seq data. Droplet-based singlecell sequencing data 29,30 were processed by using the Cell Ranger Single-Cell Software Suite (v3.0.1, 10x Genomics). SMART-Seq2 data were processed as described by Tsang et al. 36 Briefly, raw reads were assigned to cells based on cellspecific barcodes in read 2 of paired-end reads. Read 1 reads were trimmed by using CutAdapt 60 to remove the template switch oligo sequence and the poly(A) tail sequence. All reads were aligned to hg19 genome by using STAR (v2.5.2) 63 .
Single-cell significance analysis of alternative polyadenylation (scSAAP). Single-cell reads were mapped to the hg19 genome by using STAR 63 . Single cells were clustered with the Seurat package 65 . TB clusters were extracted and subtyped based on the TB marker gene panel. Reads from single cells in each cell cluster were summed, and each cell cluster was treated as a bulk RNA-seq sample for 3ʹUTR APA analysis. Pseudotime analysis was performed by using the monocle package 37 . The TB marker panel was used to order single cells along the TB subtype branches. The cells in each TB subtype branch were then divided equally in cell number into two groups ("near" and "far" based on distance from the center).
Gene ontology analysis. Gene ontology (GO) analysis was carried out by using the GOstats Bioconductor package 66 , where the hypergeometric test was performed without multiple comparison adjustment. Generic terms (associated with >1000 genes) were discarded. To reduce redundancy in reporting, any GO term with a greater than 75% gene overlap with a more significant term was discarded.
Statistics and reproducibility. The Student's t-test was used to determine statistical significance between groups; all tests are two-sided. The Fisher's exact test or DEXSeq were used to determine the significance of APA changes. The K-S test was used to compare distributions between gene sets. The Wilcoxon test was used to compare gene expression changes in different gene sets; all tests are two-sided. Boxplots were generated with the top bound defined as the third quartile, the bottom bound defined as the first quartile, and the center defined as the median. Where shown, the following symbols indicate statistical significance: n.s., P ≥ 0.05; *P < 0.05; **P < 0.01, ***P < 0.001.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Sequencing datasets that were generated in this study have been deposited into the NCBI GEO database under the accession number GSE138759. All datasets used in this study are listed in Supplementary Table 5. All data is available from the corresponding author upon reasonable request.