Letter | Published:

Single-cell transcriptomics reconstructs fate conversion from fibroblast to cardiomyocyte

Nature volume 551, pages 100104 (02 November 2017) | Download Citation


Direct lineage conversion offers a new strategy for tissue regeneration and disease modelling. Despite recent success in directly reprogramming fibroblasts into various cell types, the precise changes that occur as fibroblasts progressively convert to the target cell fates remain unclear. The inherent heterogeneity and asynchronous nature of the reprogramming process renders it difficult to study this process using bulk genomic techniques. Here we used single-cell RNA sequencing to overcome this limitation and analysed global transcriptome changes at early stages during the reprogramming of mouse fibroblasts into induced cardiomyocytes (iCMs)1,2,3,4. Using unsupervised dimensionality reduction and clustering algorithms, we identified molecularly distinct subpopulations of cells during reprogramming. We also constructed routes of iCM formation, and delineated the relationship between cell proliferation and iCM induction. Further analysis of global gene expression changes during reprogramming revealed unexpected downregulation of factors involved in mRNA processing and splicing. Detailed functional analysis of the top candidate splicing factor, Ptbp1, revealed that it is a critical barrier for the acquisition of cardiomyocyte-specific splicing patterns in fibroblasts. Concomitantly, Ptbp1 depletion promoted cardiac transcriptome acquisition and increased iCM reprogramming efficiency. Additional quantitative analysis of our dataset revealed a strong correlation between the expression of each reprogramming factor and the progress of individual cells through the reprogramming process, and led to the discovery of new surface markers for the enrichment of iCMs. In summary, our single-cell transcriptomics approaches enabled us to reconstruct the reprogramming trajectory and to uncover intermediate cell populations, gene pathways and regulators involved in iCM induction.


Direct cardiac reprogramming that converts scar-forming fibroblasts into iCMs shows promise as an approach to replenish lost cardiomyocytes in diseased hearts1,2,3,4. Considerable efforts have been made to improve the efficiency and unravel the underlying mechanism5,6,7,8,9,10,11,12,13,14,15. However, it still remains unknown how the conversion of fibroblast into cardiomyocyte is achieved without following conventional cardiomyocyte specification and differentiation process. This is partly because the starting fibroblasts exhibit molecular heterogeneity that is mostly uncharacterized, and the reprogramming population contains fully, partially and unconverted cells. Traditional population-based genome-wide approaches are incapable of resolving this unsynchronized cell-fate-switching process. Therefore, we leveraged the power of single-cell transcriptomics to better investigate the reprogramming of iCMs that is mediated by Mef2c, Gata4 and Tbx5.

Previous studies have indicated that a snapshot of an unsynchronized biological process can capture cells at different stages of the process16. Because the emergence of iCMs occurs as early as day 3 (refs 1, 11, 12, 13, 14, 15), we reasoned that day-3 reprogramming fibroblasts contain a wide spectrum of cells that are transitioning from fibroblast to iCM. We therefore performed single-cell RNA sequencing (RNA-seq) on day-3 cardiac fibroblasts infected with separate Mef2c, Gata4 and Tbx5 viral constructs (hereafter M + G + T) from seven independent experiments (for experimental design, see Extended Data Fig. 1), followed by a series of quality control steps (Extended Data Fig. 1, Methods and Supplementary Tables 1, 2). Extensive data normalization was performed to correct for technical variations and batch effects (Extended Data Figs 1, 2 and Methods). After comparing the entire set of single-cell RNA-seq data to bulk RNA-seq data of endogenous cardiac fibroblasts and cardiomyocytes that were obtained from parallel experiments, we detected a group of resident or circulating immune or immune-like cells (Extended Data Fig. 3) that were not included in the subsequent analyses.

Unsupervised hierarchical clustering and principal component analysis (PCA) on the remaining 454 non-immune cells revealed three gene clusters that account for most of the variability in the data: cardiomyocyte-, fibroblast- and cell-cycle-related genes (Fig. 1a, b and Extended Data Fig. 4a–c). On the basis of the expression of cell-cycle-related genes, the cells were grouped into cell-cycle-active (CCA) and cell-cycle-inactive (CCI) populations (Fig. 1a); this was confirmed by the molecular signature of the cells in their proliferation states (Extended Data Fig. 4d–g, proliferating or non-proliferating). Within CCA and CCI populations, hierarchical clustering further identified four subpopulations based on differential expression of fibroblast versus cardiomyocyte genes: fibroblasts, intermediate fibroblasts, pre-iCMs and iCMs (Fig. 1a). When plotted using PCA or t-distributed stochastic neighbour embedding analysis, a stepwise transcriptome shift from fibroblast to intermediate fibroblast to pre-iCM to iCM was evident (Fig. 1c and Extended Data Fig. 4h, i). We also analysed the reprogramming process as a continuous transition using SLICER (selective locally linear inference of cellular expression relationships)17, an algorithm for inferring nonlinear cellular trajectories (Fig. 1d, e). The trajectory built by SLICER suggested that fibroblasts, intermediate fibroblasts, pre-iCMs and iCMs form a continuum on the bottom CCI path, representing an iCM-reprogramming route. We further calculated the pseudotime for each cell on the trajectory by defining a starting fibroblast and measuring the distance of each single cell to the starting cell along the reprogramming route (Fig. 1e). We then examined the distribution of cells along the pseudotime line by plotting the ‘free energy’ (max(density) − density) of the trajectory and discovered a peak (lowest density) in pre-iCM state (Fig. 1f). These data suggest that the pre-iCM stage is an unstable cell state seeking to settle into a more stable state, such as the iCM state, consistent with the PCA and hierarchical clustering analysis showing that pre-iCMs express both cardiomyocyte and fibroblast markers as an intermediate cell type and our other experimental evidence (Fig. 1a–c and Extended Data Fig. 4j–o). To experimentally test the iCM route, we performed population-based gene expression profiling at reprogramming days 0, 3, 5, 7, 10 and 14 (Fig. 1g, h and Extended Data Fig. 4p–v). PCA generated a pattern showing an oriented path during reprogramming (Fig. 1g and Extended Data Fig. 4p–s). Expression of the three main gene clusters selected from single-cell data showed consistent changes in population data (Fig. 1h and Extended Data Fig. 4t–v), supporting the SLICER trajectory.

Figure 1: Single-cell RNA-seq reconstructs iCM reprogramming and identifies intermediate cell populations.
Figure 1

a, Hierarchical clustering results of 454 single cardiac fibroblasts that were infected with M + G + T or that were mock- or DsRed-infected for 3 days with representative gene ontology terms of the three identified gene clusters underneath. ECM, extracellular matrix; Fib, fibroblast; H, high; iFib, intermediate fibroblast; L, low; M, medium; Pos. reg. of SMC prolif., positive regulation of smooth muscle cell proliferation. For P values, see Extended Data Fig. 4. b, c, PCA showing representative genes (b) or cell groups (c). PC, principal component. In b, cell cycle genes are shown in orange, cardiomyocyte markers in red and fibroblast markers in blue. d, e, Three-dimensional trajectory constructed by SLICER showing hierarchical clustering/PCA cell groups (d) or pseudotime (e). LLE, local linear embedding; NP, non-proliferating; Pro, proliferating. f, Free energy of the reprogramming process. g, h, Microarray of MGT- or LacZ-transduced cardiac fibroblasts from day 0 to 14 plotted as a PCA plot (g) or heat map (h) showing the mean expression of representative genes from a, b. CM, cardiomyocyte markers; Fib, fibroblast markers. i, Comparison of the CCA:CCI ratio in intermediate fibroblasts, pre-iCMs and iCMs. jp, Cell-cycle synchronization (jl) or immortalization (mp) of cardiac fibroblasts for iCM induction (see Methods). CF, cardiac fibroblast; flow, flow cytometry; ICC, immunocytochemistry; noco, nocodazole; puro, puromycin; zeo, zeocin. j, Schematic of the cell-cycle synchronization experiment. k, l, Quantification of flow cytometry analysis. Fold change in the number of positive cells after nocodazole treatment compared to DMSO (k) or low serum treatment compared to normal serum levels (l). n = 4 samples. np, Representative 40× images of α-actinin and cardiac troponin T (cTnT) with Hoechst are shown in n and the quantification is shown in o, p. o, Fold change in the percentage positive CF-T cells compared to cardiac fibroblasts (CF). p, Fold change in the number of positive CF-T cells per field compared to cardiac fibroblasts (CF). n = 30 images. Scale bars, 100 μm. Data are mean ± s.e.m., two-sided Student’s t-test: *P < 0.05, **P < 0.01, ***P < 0.001.

By analysing CCA and CCI populations, we found that even though proliferative iCMs (CCA iCMs) were observed (Fig. 1a–d), iCMs and pre-iCMs were predominantly CCI (Fig. 1i). We therefore designed four sets of experiments to address the relationship between cell proliferation and iCM reprogramming by: (1) manipulating the expression of cell-cycle-related genes in fibroblasts that were lentivirally transduced with M + G + T (Extended Data Fig. 5a–p) or infected with a single doxycycline-inducible MGT construct (Extended Data Fig. 5q–s); (2) synchronizing the cell cycle of starting cardiac fibroblasts (Figs 1j–l); (3) transiently overexpressing large T antigen to accelerate cardiac fibroblast proliferation (Extended Data Fig. 5t–z); (4) establishing an immortalized cardiac fibroblast (CF) line CF-T (see Methods) before the initiation of iCM reprogramming (Fig. 1m–p). All four sets of experiments yielded consistent results showing that decreased proliferation or cell-cycle synchronization enhanced iCM reprogramming, whereas increased proliferation suppressed iCM generation.

We next examined the cellular composition of our isolated starting cardiac fibroblasts (see Methods) and identified five subpopulations (Fig. 2a, b, Extended Data Fig. 6a–i and Supplementary Discussion 1). To delineate how these subpopulations were reprogrammed, we applied hierarchical clustering calculated from the starting cardiac fibroblasts to those that had been transduced with M + G + T and determined the correlation of the expression of non-cardiomyocyte lineage markers to the status of reprogramming (Fig. 2c). Expression of both endothelial and epicardial genes was significantly decreased in all cells that were transduced with M + G + T, irrespective of the reprogramming status. However, fibroblast and myofibroblast and/or smooth muscle genes were suppressed in iCMs, but not in intermediate fibroblasts or pre-iCMs (Fig. 2c and Extended Data Fig. 6j, k); this finding was supported by experimental data tracking protein expression of representative markers along the reprogramming trajectory (Fig. 2d–f and Extended Data Fig. 6l, m). Therefore, we conclude that endothelial and epicardial genes can be readily suppressed, whereas fibroblast and myofibroblast and/or smooth muscle genes were gradually suppressed along the course of reprogramming. This differential suppression is consistent with the difference in the layer of origin among different cardiac cell lineages during development and suggests that recent (epigenetic) memories might be easier to be erased than ones that have been gained earlier in development. The progressive suppression of fibroblast markers also indicates that there is a difference between iCM and induced pluripotent stem cell (iPS cell) reprogramming, because early downregulation of fibroblast markers, such as Thy1, is one of the hallmarks and prerequisites for iPS cell reprogramming to proceed18.

Figure 2: Heterogeneity of cardiac fibroblasts and stepwise suppression of non-cardiomyocyte lineages during iCM induction.
Figure 2

a, b, Hierarchical clustering (a) and PCA (b) of control cardiac fibroblasts with representative gene expression and gene ontology analysis of the five identified gene clusters. c, Hierarchical clustering calculated with control cardiac fibroblasts (a) applied to M + G + T-transduced cells with representative gene expression. Epi, epicardial; Endo, endothelial. df, Representative 40× immunocytochemistry images (d, e) and quantification (f) of Thy1 or SM22α (the protein product of the gene Tagln) and α-MHC–GFP during reprogramming. d, day. n = 20 images. Scale bars, 100 μm. Data are mean ± s.e.m. In violin plots, box plots were included inside the plot. The centre dot represents median gene expression and the central rectangle spans the first quartile to the third quartile of the data distribution. The whiskers above or below the box show the locations of 1.5× interquartile range above the third quartile or below the first quartile.

To understand the molecular cascades that underlie iCM induction, we performed nonparametric regression and -medoid clustering (see Methods), and identified three major clusters of genes that are significantly related to and show similar trends during reprogramming (Extended Data Fig. 7a–d). Further analysis identified six smaller gene clusters with narrower variation across the trend and gene ontology analyses were performed for each cluster (Fig. 3a–g, Supplementary Table 3 and Supplementary Discussion 2). The largest cluster (cluster 1) that shows a trend of immediate and continuous downregulation of gene expression is enriched in gene ontology terms related to protein translation/biosynthesis, modification and transportation (Fig. 3b). Such changes are probably to balance for increased energy requirements during the cell-fate switch and/or to transit from a protein production and ‘secretion factory’ (a fibroblast) to an energy-consuming ‘power station’ (a cardiomyocyte). The downregulated genes in cluster 2 are enriched in gene ontology terms that suggest a late suppression of fibroblast genes and growth factors, whereas the upregulated genes in clusters 4 and 5 are enriched in gene ontology terms that indicate engagement in a metabolic shift and structural changes towards a cardiomyocyte fate (Fig. 3e, f).

Figure 3: Identification of Ptbp1 as a barrier to iCM splicing repatterning.
Figure 3

a-g, Six gene clusters were identified during reprogramming (a) with gene ontology analysis (bg, false discovery rate (FDR) <0.05). The number of genes is shown in parentheses. h, i, Representative 20× immunocytochemistry images of cTnT and α-MHC–GFP (h) and quantification (i) of MGT-infected cardiac fibroblasts treated with shRNA against Ptbp1 (shPtbp1) or shRNA non-targeting control (shNT). n = 20 images. Scale bar, 200 μm. Data are mean ± s.e.m., two-sided Student’s t-test: ***P < 0.001. jq, Splicing analyses of day-3 MGT-infected cardiac fibroblasts treated with shPtbp1 or shNT. j, k, Correlation between ΔPSI of cardiomyocytes versus cardiac fibroblasts, and ΔPSI of MGT versus LacZ (j) or MGT and shPtbp1 versus MGT and shNT (k). The trend line generated by linear regression and P values from a one-sided binomial test are shown. l, Number of detected alternative splicing events among the five alternative splicing types. AS, alternative splice; A3SS/A5SS, alternative 3′/5′ splicing site; IR, intron retention; ES, exon-skipping event; MXE, mutually exclusive spliced exon. m, Positional distribution of a Ptbp1-binding motif (sequence shown across the top). Motif enrichment scores (top) and P values (bottom) were plotted against genomic positions. The dashed black line indicates P = 0.05. Red/blue arrows indicate peaks of enrichment for exons that were included/skipped more often upon Ptbp1 knockdown, respectively. n, o, Gene ontology analysis of alternatively spliced genes between MGT and shPtbp1 and MGT and shNT (n) with a representative Sashimi plot (o). CM, cardiomyocyte; Mitochondrion inner mem., mitochondrion inner membrane; Inclevel, inclusion level. p, q, Expression of overlapping genes between differentially expressed genes (MGT and shPtbp1 versus MGT and shNT) and differentially expressed genes (MGT versus LacZ) (p) and shPtbp1-only differentially expressed genes (q). KD, knockdown; WT, wild type.

Unexpectedly, we found that cluster 1 is also enriched in the gene ontology terms ‘mRNA splicing’, ‘mRNA processing’ and ‘RNA recognition motif’. This finding prompted us to interrogate the role of splicing factor(s) in iCM induction. We therefore used an inducible iCM cell line derived from mouse embryonic fibroblasts (icMEFs)19 to screen a short hairpin RNA (shRNA) library that targeted 26 splicing factors representing the most common splicing factor families20 and identified Ptbp1 as the top candidate that also showed differential expression in cardiac fibroblasts versus cardiomyocytes (Extended Data Fig. 7e–h). Notably, knockdown of Ptbp1 in various primary fibroblasts consistently resulted in a significant increase in reprogramming efficiency (Fig. 3h, i and Extended Data Fig. 8a–p), demonstrating that Ptbp1 is a general barrier to iCM induction. However, overexpression of Ptbp1 has minimal effects (Extended Data Fig. 8q–u). To understand how Ptbp1 silencing led to improved iCM reprogramming, we performed high-depth RNA-seq to analyse alternative splicing events of day-3 reprogramming cells with or without Ptbp1 expression. A total of 1,494 alternative splicing events were detected upon Ptbp1 knockdown, 97% of which were not induced by MGT alone (Extended Data Fig. 9a and Supplementary Tables 4, 5). Notably, calculation of the difference in the percentage of spliced-in (ΔPSI) suggested that alternative splicing events between reprogramming versus control fibroblasts and endogenous cardiomyocytes versus cardiac fibroblasts were in an opposite direction (negative association, P = 0.008). Knockdown of Ptbp1 in reprogramming fibroblasts, however, induced a strong positive association (P = 2.2 × 10−16), suggesting that Ptbp1 silencing together with MGT, but not MGT alone, shifted the splicing pattern from cardiac fibroblast towards cardiomyocyte (Fig. 3j, k). Furthermore, a higher percentage of exon-skipping events (63%) of the five known alternative splicing types was observed in MGT-infected cells upon Ptbp1 silencing (Fig. 3l). Motif analysis using the RNA map analysis and plotting server (rMAPS)21 showed that a CT-rich Ptbp1-binding motif was significantly enriched in exon-skipping exons compared to background exons (Fig. 3m). Notably, in exons that were included more often upon Ptbp1 knockdown, the motif was strongly enriched within 100 bp of the upstream intron (P < 1 × −30), whereas, in exons that were skipped more often upon Ptbp1 knockdown, the motif was less strongly enriched, but showed a broad peak at 50–200 bp in the downstream intron (P < 0.05). These data are consistent with the higher percentage of inclusion (69%) than skipping (31%) among exon-skipping events that were observed in Ptbp1 knockdown samples (Extended Data Fig. 9b), suggesting that Ptbp1 is a repressor of exon inclusion when bound to an upstream intron, and probably is a weaker repressor of exon skipping when bound to a downstream intron. Next we assessed the gene ontology terms of genes that were alternatively spliced upon Ptbp1 silencing (Fig. 3n, o and Extended Data Fig. 9c–i). In addition to altering the splicing patterns of genes related to cardiomyocyte lineage and function (Fig. 3n), Ptbp1 silencing resulted in changes in the splicing pattern of 21 other splicing factors, suggesting that Ptbp1 knockdown might trigger a second wave of splicing changes by regulating the switching of the isoform of other splicing factors. Furthermore, we explored the potential downstream effects of Ptbp1-mediated re-patterning of splicing events (Supplementary Table 6 and Supplementary Discussion 3). DESeq2 (ref. 22) analyses of differentially expressed genes revealed that Ptbp1 knockdown enhanced the MGT-induced cardiac fibroblast to cardiomyocyte transcriptome shift by augmenting MGT-mediated changes (Fig. 3p and Extended Data Fig. 9j–n) and altering the expression of an additional set of cardiac and fibroblast lineage genes (Fig. 3q and Extended Data Fig. 9o).

To determine whether cardiac reprogramming is a rare and random event or a Mef2c-, Gata4- and/or Tbx5-determined process, we plotted the expression of Mef2c, Gata4, Tbx5 and M + G + T in each cell against the reprogramming pseudotime of that cell calculated by SLICER (Fig. 4a and Extended Data Fig. 9p). We found that the expression levels of Mef2c, Gata4, Tbx5 and M + G + T are highly correlated with the reprogramming progress, despite the fact that their expression was not used in the generation of the trajectory. We also determined the mean expression levels of Mef2c, Gata4 and Tbx5 and the mean ratio of expression (Mef2c/Gata4, Mef2c/Tbx5 and Gata4/Tbx5) in the fibroblast, intermediate fibroblast, pre-iCM and iCM populations along the reprogramming trajectory (Extended Data Fig. 9q–s). Consistent with our previous studies6,14,23, we observed higher levels of Mef2c than Gata4 and Tbx5 in iCMs, further underscoring the importance of high Mef2c expression in iCM induction.

Figure 4: iCM reprogramming determined by Mef2c, Gata4 and Tbx5 and identification of novel surface markers.
Figure 4

a, Correlation between expression of Mef2c, Gata4 and Tbx5 and SLICER pseudotime. b, Left, correlation between Tbx5 expression and its targets with gene ontology analysis. Right, intercorrelation of genes on the left. Three sets of co-expressed genes (A, B, C) are shown (P < 2.6 × 10−6). Pos. reg. of nt metabolism, positive regulation of nucleotide metabolism; pos. reg. of transcription, positive regulation of transcription. c, Top 20 potential negative selection markers for iCM. d, Correlation of the expression of the four surface markers (labelled in red in c) and reprogramming progress (left) and the expression of these markers in different cell groups (right violin plots). In violin plots, box plots were included inside the plot. The centre dot represents median gene expression and the central rectangle spans the first quartile to the third quartile of the data distribution. The whiskers above or below the box show the locations of 1.5× interquartile range above the third quartile or below the first quartile. e, f, Representative 40× immunocytochemistry images (e) and quantification (f) of Cd200 and α-MHC–GFP during reprogramming. n = 20 images. Scale bar, 100 μm. Data are mean ± s.e.m. Linear regression reports P < 1 × 10−41 (a) and P < 1 × 10−39 (d), α = 0.05, two-sided analysis.

To unravel the gene networks regulated by reprogramming factors, we navigated the relationship between the expression of a reprogramming factor and its downstream targets in each single cell. Using Tbx5 as an example, we calculated the Spearman correlation between Tbx5 expression and the expression of its downstream targets24,25 within each reprogramming cell (Fig. 4b, left). We then generated a correlation matrix for selected Tbx5 targets to determine their co-expression patterns (Fig. 4b, right). The correlation patterns suggest that Tbx5 acts by promoting cardiac function-related genes and by suppressing protein biosynthesis and non-cardiomyocyte lineages (Fig. 4b and Extended Data Fig. 9t, u).

Finally, we aimed to discover novel markers for targeting or enriching cell populations during iCM induction. To identify specific markers for each cell population along the reprogramming trajectory, we selected genes that were expressed significantly higher (for positive selection markers) or lower (for negative selection markers) in the cell population of interest than the other three populations (Tukey-adjusted P value <0.05 in pairwise comparisons after ANOVA; Extended Data Fig. 10a–f and Supplementary Table 7). Negative selection markers for iCMs appeared the most attractive as a supplement to cardiac positive selection markers. Among the top 20 negative markers for iCMs, we focused on four surface markers, Cd200, Clca1, Tm4sf1 and Vcam1 (Fig. 4c). Linear regression analysis suggests that the expression of these markers was highly anti-correlated with the reprogramming process and was barely detectable in iCMs (Fig. 4d). Further experimental validation confirmed that Cd200 was a negative selection marker (Fig. 4e, f), and knockdown of Cd200 did not affect reprogramming efficiency (Extended Data Fig. 10g–n).

We have used single-cell transcriptomics analysis to gain insights into the heterogeneity of cells within an unsynchronized cardiac reprogramming system. The findings show promise for improving the efficiency and detection of iCM formation. We also anticipate that the experimental and analytical methods presented here, when applied in additional cell programming or reprogramming contexts, will yield crucial insights into cell fate determination and the nature of cell type identity.


Mouse strains and plasmids

Transgenic CD1 mice that expressed α-MHC-promoter-driven GFP were described previously1. All animal experiments conformed to the NIH guidelines (Guide for the Care and Use of Laboratory Animals) and UNC Qian Laboratory animal protocol 15.277.0. This protocol was approved by the University of North Carolina at Chapel Hill Institutional Animal Care and Use Committee (IACUC) that oversees the university’s animal care and use (NIH/PHS Animal Welfare Assurance Number: A3410-0; USDA Animal Research Facility Registration Number: 55-R-0004; AAALAC Institutional Number: 329). pMXs retroviral vectors containing mouse Gata4, Mef2c or Tbx5 were described previously1. The empty pMXs and pMXs-puro retroviral vectors were purchased from Cell Biolabs and they contain a partial LacZ stuffer sequence and were therefore referred to as LacZ in this manuscript. pMXs-DsRed and the polycistronic pMXs-puro-MGT were described previously14. Lentiviruses containing Mef2c, Gata4 or Tbx5 were cloned by replacing the GFP insert in pLenti-GFP-puro (Addgene 17448) with Mef2c, Gata4 or Tbx5 using BamHI and SalI. pTripZ-rTtA was cloned by removing the tet-on promoter and RFP in the pTripZ vector26 using XbaI and MluI followed by blunt-end ligation. pTripZ-iMGT was constructed by four steps. First, an intermediate plasmid pTripZ-iRFP was cloned to remove the Ubc promoter, rTtA and Puro sequences from the original pTripZ vector, which was achieved by replacing the sequences between MluI and Acc65I with PCR-amplified WPRE. Second, to introduce an AgeI site before Mef2c, the first ~600 bp of the Mef2c sequence before the BsrGI restriction site was PCR-amplified and cloned into pGEMT-easy (Promega), resulting in pGEMT-AgeI-Mef2c-BsrGI. Third, MGT was excised from pGEMT-MGT14 and inserted into pGEMT-AgeI-Mef2c-BsrGI with BsrGI and SalI, resulting in pGEMT-AgeI-MGT-MluI (there is a MluI site located in the pGEMT-easy vector after SalI). Fourth, pTripZ-iMGT was cloned by replacing RFP in pTripZ-iRFP with polycistronic MGT excised from the pGEMT vector using AgeI and MluI. For gene overexpression, Ptbp1, Cd200 and cell cycle-related genes (p15 (also known as Cdkn2b), p16 (also known as Cdkn2a), Ccnd1, Ccnd2 and Ccne1) were PCR amplified from cDNA of neonatal mouse cardiac fibroblasts and cloned into the pLenti vector using BamHI and SalI (or XbaI and SalI for Ptbp1 and Ccnd2 and BamHI and XhoI for Ccnd1). The control pLenti-LacZ vector was cloned by replacing the GFP insert in pLenti-GFP with the partial LacZ sequence from pMXs-puro using BamHI and SalI. Cloning primers are listed in Supplementary Table 8. pBabe-Zeo-LargeT was purchased from Addgene (1779). The non-targeting shNT pLKO.1-scramble plasmid was described previously26 and all other shRNAs (pLKO.1-vector based, MISSION shRNA glycerol stock) were purchased from Sigma and their TRC numbers are listed in Supplementary Table 8.

Isolation of neonatal cardiac fibroblasts and cardiomyocytes, and generation of iCM

We chose to reprogram mouse neonatal cardiac fibroblasts, which were used in the first1 and many of the subsequent cardiac reprogramming studies2,3,4,5,6,8,12,14,15,23,27. Cardiac fibroblasts were isolated using standard protocols described previously23,27. Specifically, neonatal (postnatal day (P)1.5) hearts were isolated from α-MHC–GFP+ pups and rinsed thoroughly with chilled phosphate-buffered saline (PBS). The hearts were then minced by a razor blade, transferred to 8 ml warm 0.05% Trypsin-EDTA (Gibco), and incubated at 37 °C for 10 min. After five rounds of collagenase digestion (5 ml of warm 0.2% collagenase type II in HBSS for 3 min at 37 °C followed by vortexing for 1 min), a single-cell suspension was obtained by passing through 40-μm cell strainers. The cells were then suspended in 1 ml of red-blood-cell lysis buffer (150 ml NH4Cl, 10 mM KHCO3 and 0.1 mM EDTA) for 1 min on ice and resuspended in magnetic-activated cell sorting buffer (MACS buffer: DPBS, 0.5% BSA, 2 mM EDTA). To sort Thy1+ cells, approximately 1 × 107 cells were suspended in 90 μl MACS buffer with 10 μl Thy1.2 micro-beads (Miltenyi Biotec) at 4 °C for 30 min. The cells were then washed, suspended in MACS buffer and applied to an equilibrated LS column (Miltenyi Biotec). Cells bound to beads were flushed out after two washes and seeded onto 0.1% gelatin-coated plates at 2.5 × 104 per cm2 in fibroblast medium (IMDM, 20% FBS, 1× penicillin–streptomycin). After overnight culturing, the medium was replaced to remove unattached cells. We refer to the MACS-isolated Thy1+ adherent non-cardiomyocytes as neonatal cardiac fibroblasts. For bulk RNA-seq experiments, neonatal cardiac fibroblasts were similarly isolated, except that MACS-isolated Thy1+ cells were directly lysed in TRIzol (Life Technology) without culturing. Neonatal cardiomyocytes were isolated using the neonatal cardiomyocytes isolation system (Worthington Biochemical Corporation) except that all enzymes were used at a quarter of the recommended concentration to increase cell viability. After a 1.5 h of pre-plating on an uncoated surface to remove attached non-cardiomyocytes, the unattached cardiomyocytes were collected in TRIzol (>80% viability by Trypan blue staining).

For iCM generation, pMXs retroviruses were packaged by transfecting platE cells (Cell Biolabs) with Lipofectamine 2000 (Life Technology) as previously described14. Viruses collected from one 10-cm dish were resuspended in 100 μl iCM medium (10% FBS in DMEM:M199 (4:1)) and added to cells at 5 μl of each virus (if cotransducting) per cm2 of surface area. All transductions were performed in iCM medium containing 4 μg ml−1 of polybrene. For single-cell RNA-seq, cardiac fibroblasts were untransduced, transduced with a 1:2 ratio of pMXs-DsRed:pMXs (LacZ) or transduced with equal amounts of Mef2c, Gata4 and Tbx5 viruses. For microarray experiments, cardiac fibroblasts were transduced with the control pMXs-puro (LacZ) or the pMXs-puro-MGT viruses and collected in TRIzol. Day 5, 7, 10 and 14 samples were selected with 2 μg ml−1 puromycin from day 3 and maintained in 1 μg ml−1 puromycin from day 6. Day-0 samples were overnight-cultured cardiac fibroblasts that were collected immediately before viral transduction. For bulk RNA-seq, cardiac fibroblasts were transduced with pMXs-puro (LacZ), pMXs-puro-MGT, pMXs-puro-MGT + shNT or pMXs-puro-MGT + shPtbp1-271 for three days and then collected in TRIzol. All microarray and bulk RNA-seq samples were prepared in duplicate.

Capture of single cells, RNA spike-ins and preparation of cDNA

Single cells were captured using the Fluidigm C1 system (up to 96 single cells per plate). A total of seven individual experiments (E1–E7) were performed starting from mouse breeding, cardiac fibroblast isolation, iCM reprogramming, to single-cell capture and cDNA preparation (see Extended Data Fig. 1 for experimental design and workflow). Three of the seven experiments (E1, E2 and E4) contained only M + G + T-transduced cells. Four of the seven experiments (E3 and E5–E7) contained cells treated with two different conditions in order to estimate the relative abundance of mouse mRNA between treatments. Specifically, for experiments E1, E2 and E4, cardiac fibroblasts transduced with M + G + T for 3 days were collected by trypsinization, stained with 7AAD or NearIR Live/Dead dye (Thermo Fisher Scientific), and FACS-sorted for live cells (negative for the Live/Dead dye). Pilot experiments showed an average diameter of 12.6 μm and a buoyancy of 7.5:2.5 (cells:buoyancy buffer) of cardiac fibroblasts. Therefore, the sorted single-cell suspension (around 2,000 cells per μl) was loaded on a medium-sized (10–17 μm) microfluidic RNA-seq chip (C1 Single-Cell mRNA Seq IFC, 100-6041, initially designed chips were used in E1–E3 and redesigned chips were used in E4–E7) and single cells were captured with the C1 system. Bright field images were taken of each capture site. For experiments E3 and E5–E7, day-3 M + G + T-transduced (E5 and E6) or untransduced (E3 and E7) cardiac fibroblasts were stained with the NearIR Live/Dead dye and 0.25–1 μM carboxyfluorescein succinimidyl ester (CFSE, Thermo Fisher Scientific), whereas the DsRed-transduced cardiac fibroblasts were stained with the NearIR Live/Dead dye only. Then, 12,000 CFSE- and NearIR-stained green fluorescent cells and 12,000 DsRed- and NearIR-stained red fluorescent cells were sequentially FACS-sorted into the same tube and mixed. For experiment E3, cell sorting was slightly different, and 700 of each of the CFSE single-positive cells (untransduced), DsRed single-positive cells (DsRed-transduced) and double-negative cells (from the DsRed-transduced wells but with no DsRed protein expression) were sorted into a single-cell suspension. After cell capture, fluorescent images of GFP and RFP channels as well as bright field pictures were taken.

Next, control RNA spike-ins were added into lysis mix A (see Fluidigm’s protocol), which were then loaded onto the IFC plate before cell lysis. Experiments E1 and E2 used the Ambion Array Control spike-ins (AM1780) that were included in the SMARTer kit. E1 used only spike 1, 4 and 7 according to Fluidigm’s protocol but at a concentration that is 100-fold higher than suggested, based on recommendations from the UNC Advanced Analytics Core (AAC) that provided the Fluidigm service. E2 used all 8 spike-ins contained in the kit at the following working concentrations (before addition to lysis mix A): 10 pg μl−1 of spike 1, 1 pg μl−1 of spike 2 and with a 10-fold reduction for the next spike and so on. For E3, we used the Ambion spike-ins at half the concentration of those used in E2 and another spike-in, the External RNA Controls Consortium (ERCC) RNA spike-in Mix 1 (Ambion, Life Technologies) after an 80,000-fold dilution. For E4–E7, only the ERCC spike-in was used after a 40,000-fold dilution and 1 μl of the diluted working spike-in was mixed with 19 μl of other components to make lysis mix A. Then cell lysis, reverse transcription and cDNA pre-amplification were performed on the chip according to Fluidigm’s standard protocol and the control RNA spike-ins were processed in parallel with cellular RNA. Differences in spike-ins added to each experiment reflected how the technology evolved over time during the progress of this project. To address the spike-ins issue, among others, we developed a pipeline described in the ‘Processing and normalization of single-cell RNA-seq data’ section to normalize and analyse all acquired useful data.

Illumina library preparation and sequencing

After in situ cDNA library preparation, the bright field and fluorescent images of each capture site (nest) on the chip were carefully examined. Forty-six empty nests, 30 nests with two or more cells, and 22 nests containing morphologically unhealthy cells out of 672 capture sites on seven chips were excluded from further analysis, resulting in 574 single-cell cDNA libraries. The size distribution and quality of cDNA libraries from each single cell were ensured by bioanalyzer. For E3 only, cDNA library concentrations were measured with picogreen (Thermo Fisher Scientific) and four single-cell cDNA libraries below 1 ng μl−1 were excluded from further analysis. E1–E4 each contained a negative control from an empty nest that was processed in parallel with other healthy single cells. Therefore a total of 574 high-quality cDNA libraries were submitted to the UNC High Throughput Sequencing Facility (HTSF). Illumina libraries were prepared using the Nextera XT DNA Sample Preparation kit according to Fluidigm’s standard protocol, except 13 cycles of amplification were carried out. The barcoded single-cell Illumina libraries of each experiment were pooled and sequenced for 50-bp single-end reads on Illumina HiSeq 2500. Illumina library preparation and sequencing were carried out in three batches: E1 by itself on two lanes, E2 and E3 processed together on one lane each, and E4–E7 processed together on one lane each. Previous studies showed that 0.5–1 million reads per cell were sufficient to detect most genes expressed by single cells28,29. In this study, we sequenced the cells at about 1–5 × 106 reads per cell. Raw reads were re-assigned to each single cell by their unique Nextera barcode and sequencing reads without barcodes were received from the HTSF in.fastq format.

For microarray and bulk RNA-seq samples, cellular RNA was extracted with TRIzol (microarray samples were further purified with the RNAeasy kit from Qiagen), and only samples with an RNA integrity number (RIN) above 8, as determined using a bioanalyzer, were further processed. Microarray samples were submitted to the HTSF for one-colour Cy-dye labelling and long oligo (60-mer) Agilent high-density microarrays. Bulk RNA-seq samples were prepared with the TruSeq Stranded mRNA Library Prep Kit (Illumina). The barcoded Illumina libraries were pooled and submitted to the HTSF for sequencing. About 6 × 107 100-bp paired-end reads per sample were obtained and sequencing reads removed of Illumina indexes were received from HTSF in.fastq format.

Processing and normalization of single-cell RNA-seq data

The quality of sequencing results was first checked by FASTQC. Reads were high quality and no trimming was required. The raw reads were then mapped to the merged genome of mm10, ERCC, and E. coli K12 with TopHat2 using default settings. Information about the number of total reads and the percentages of reads mapped to spike-in or mouse genome for each single cell are detailed in Supplementary Table 1. Outliers showing high ratios of percentage reads mapped to spike-in to percentage reads mapped to mouse genome were removed (Extended Data Fig. 1d). This step removed 61 outliers from the 574 sequenced single cells, resulting in 513 high-quality single cells for analysis (Supplementary Table 2). Gene expression was counted with Htseq-count using the union mode30 (http://www-huber.embl.de/users/anders/HTSeq).

Limit of detection of our single-cell RNA-seq was determined as previously described28. In brief, the concentration of each ERCC spike-in in the lysis chamber was first calculated. For experiment E3, seven of the spike-ins were present at 1.24 molecules per chamber and were as follows: ERCC-00014, ERCC-00028, ERCC-00039, ERCC-00067, ERCC-00077, ERCC-00143 and ERCC-00150. For experiments E4–E7, five of the spike-ins were present at 1.24 molecules per chamber and were as follows: ERCC-00031, ERCC-00033, ERCC-00058, ERCC-00069 and ERCC-00134. The number of non-zero measurements of each spike-in was then counted. This number was divided by the total number of high-quality cells from that plate and this is the probability of detection for each spike-in at this concentration. Mean probability of detection of all 12 spike-ins is 0.30, consistent with previous findings28 and suggesting single-molecule sensitivity of our experiments.

We developed a three-step normalization strategy in order to extract biologically meaningful information from all the single-cell RNA-seq data (Extended Data Fig. 1c). Firstly, we normalized mouse gene raw counts to each cell’s technical and biological size factors within each experiment using a previously described method31. These two size factors account for technical variations within each experiment, such as amplification efficiency and differences in the amount of biological starting material in each cell. On the basis of the normalized DsRed counts, cells in experiments that involved two treatments were classified as DsRed-transduced (E3R, E5R, E6R and E7R, expressing high levels of DsRed), or M + G + T-transduced (E5M, E6M) or untransduced cells (E3U, E7U; Extended Data Fig. 1g).

Secondly, we corrected for ‘batch effects’ that account for technical contributions to experiment-to-experiment variations due to different cell-capture efficiency, types/amounts of spike-ins and Fluidigm chips (Extended Data Fig. 1b), while preserving biological information, such as total mRNA abundance. By comparing biological replicate experiments, we found different mean total mRNA counts per cell (Extended Data Fig. 1h) that probably resulted from varying cell-capture efficiency per plate (68 sequenced cells in E5, 33% more compared to 51 cells in E6; Supplementary Table 2), various amounts of spike-ins used (100-fold more concentrated spike-ins in E1 than in E2) and different types of spike-ins and Fluidigm chips used (Fluidigm spike-in and previous chip in E2 and ERCC spike-in and redesigned chip in E4), suggesting the existence of batch effects. To determine whether different treatments affected mouse mRNA abundance in the cell, we also examined mean total mRNA reads from different treatments in the same experiment (Extended Data Fig. 1h). We found no difference in mean total mRNA counts between uninfected and DsRed-transduced cells (E3U versus E3R and E7U versus E7R) but 40% less counts in cells undergoing reprogramming (M + G + T-transduced, E5M, E6M) than DsRed-transduced cells (E5R, E6R), suggesting biological variations caused by treatment. Therefore, to retain mRNA abundance information while correcting for batch effects, we normalized each treatment in each experiment to an experiment size factor so that the median mRNA counts equals 1,000,000 for uninfected and DsRed-transduced cells and 616,136 (deduced from the ratio of median mRNA counts from M +G +T transduction to DsRed transduction (M:R) of E5 and E6) for M + G + T-transduced cells (Extended Data Fig. 1h). This normalization successfully removed the batch effects discussed above. An example is shown in Extended Data Fig. 1i comparing cells from E5 and E6 on a PCA plot.

Lastly, we focused on non-immune cells (462 cells in total, see Extended Data Fig. 3 for details) and removed residual batch effects using ComBat, a method that was designed for normalizing gene expression data32 and that performed well in previous studies33. After examination of all experiments in each treatment condition with PCA plots (Extended Data Fig. 2a–c, the ‘Before’ columns), we found no batch effects in the principal component (PC)1/PC2 plot, but started to see incomplete overlap of different experiments in PC3 (for uninfected cells) or PC4 (for M + G + T- and DsRed-transduced cells); PC3 and PC4 only represented <5% variance of the data. Because batch effects were observed between different chips, but not within the same chips, we postulated that the use of two different versions of the Fluidigm medium-size chips might be the cause. The ComBat normalization was run separately for each treatment to remove only technical variations between batches while preserving biological variations between treatments. ComBat requires all input genes to be expressed in all batches, that is, at least one cell in each batch. Therefore, genes that have non-zero counts in all batches were selected and normalized for each treatment. After the normalization, results from different treatments were merged. For those genes that were selected in one treatment but not others, expression levels were set to 0 in other treatment(s). After this procedure, there were a total of 14,414 genes left. PCA analyses with ComBat-normalized counts showed that no batch effects were detected in the top 20 PCs (the ‘After’ columns in Extended Data Fig. 2a–c for PC1–PC4, and data not shown), suggesting successful removal of all residual batch effects in our data.

Analysis of single-cell RNA-seq data

Outlier detection, PCA, hierarchical clustering and the generation of violin plots were performed with the ‘SINGuLAR Analysis Toolset’ package (Fluidigm) in R. First, normalized expression was log2-transformed before analysis. Outliers were detected and removed based on mean gene expression and PCA using the SINGuLAR package (Extended Data Fig. 2d, e), resulting in 454 high-quality non-immune cells for downstream analysis. Expression of the reprogramming factors Mef2c, Gata4 and Tbx5 was excluded before PCA, hierarchical clustering or SLICER (see ‘Trajectory construction and identification of genes related to iCM reprogramming’) analysis. Next, the top 400 PCA genes were selected by largest weight (loading) contribution to PCs 1, 2 or 3. Then hierarchical clustering was performed with these 400 genes and cells were grouped as fibroblasts (fibroblasts from control plates), intermediate fibroblasts (fibroblasts from M + G + T plates), pre-iCMs (cells expressing both cardiac and fibroblast markers) and iCMs (Fig. 1a). The group information was used to generate PCA plots (Fig. 1b, c), violin plots (Extended Data Fig. 4b, l) and to perform analyses of variance (ANOVA; Fig. 4c, d and Extended Data Fig. 10a–e). ANOVA and Tukey post hoc tests were performed with custom scripts in R in order to identify positive- or negative-selection markers for iCM and pre-iCM. For ANOVA, CCI but not CCA cells were used. For violin plots in Figs 2, 4, box plots were overlaid over the violin plots. The centre dot represents median gene expression and the central rectangle spans the first quartile to the third quartile of the data distribution. The whiskers above or below the box show the locations of 1.5× interquartile range above the third quartile or below the first quartile. t-distributed stochastic neighbour embedding (tSNE) analysis was performed with the ‘Rtsne’ package in R. Gene ontology analysis was performed using the DAVID functional annotation tool version 6.7 (https://david.ncifcrf.gov/). All gene ontology terms shown in this study have a P value or corrected P value (FDR) <0.05. For the comparison of distributions of number of detected genes in different cell groups, we conducted a one-sided two-sample Kolmogorov–Smirnov test (Extended Data Fig. 4n, o). Because we are comparing the distributions of two samples, the conclusion is more general than a mean test, such as a t-test, and does not rely on restrictive statistical assumptions, such as normal distributions.

In Fig. 2a–c, cells from experiments E1–E3 are shown. Analysis of data from experiments E4–E7 showed consistent results (Extended Data Fig. 6e, f, k). For Fig. 2c, first, CCI and CCA cells in fibroblasts or epicardial-like cells (Fig. 2a, b and Supplementary Discussion 1) were merged into one Fb/Epi group. Then a new hierarchical clustering for control cells in E3 was calculated using the four cell-lineage-related gene clusters but not the cell cycle genes identified in Fig. 2a. The calculated hierarchical clustering was very similar to that in Fig. 2a and was applied to reprogramming cells from E1 and E2 to generate Fig. 2c. For all correlation analyses, gene expression was always log-transformed before analysis. In Fig. 4a, d and Extended Data Fig. 9p, CCI cells were used. Linear regression was performed to obtain the regression coefficient (R value) and its corresponding P value (two-sided, α = 0.05). For correlation analysis of Tbx5 and its target genes in Fig. 4b, M + G + T-transduced CCI cells were included. The list of Tbx5 ChIP–seq peaks in HL-1 and the list of genes differentially expressed in wild-type versus Tbx5-null mutant mice hearts were obtained from previous studies24,25. Genes present in both lists (2,109 genes) were selected as Tbx5 downstream targets and used to calculate Fig. 4b. A total of 170 genes with a Spearman correlation coefficient >0.3 or <–0.3 were selected and their correlation coefficient with Tbx5 is plotted in Fig. 4b (left). Then intercorrelation between these genes was calculated and the correlation matrix ordered by hierarchical clustering is shown as a heat map in Fig. 4b (right). Three sets of genes A, B, C were found to be co-expressed (P < 2.6 × 10–6 by Spearman correlation). Representative genes of these sets are listed on the right and their corresponding gene ontology terms were labelled on the left (Fig. 4b). For correlation analysis of Mef2c, Gata4 and Tbx5 expression and expression of transcription factors or splicing factors, M + G + T-transduced CCI cells are shown in Extended Data Fig. 9t, u. The list of mouse transcription factors was obtained from public databases as previously described12 and the list of splicing factors was obtained from a previous study34.

Trajectory construction and identification of genes related to iCM reprogramming

We used SLICER (selective locally linear inference of cellular expression relationships)17, an algorithm that we have previously developed, to construct cellular trajectories of iCM reprogramming. SLICER is implemented as an R package, which is freely available on the Comprehensive R Archive Network (CRAN) and on GitHub (https://github.com/jw156605/slicer). In brief, SLICER discovers a nonlinear, low-dimensional manifold embedded in gene expression space that indicates how cellular gene expression profiles change during a sequential process. Additionally, SLICER automatically detects the presence, location and number of branches in a trajectory, corresponding to multiple cell fates or multiple cellular processes occurring simultaneously. Here, the manifold corresponds to the reprogramming process by which fibroblasts turn into iCMs. To ensure consistency between the clustering and trajectory analyses, we ran SLICER on the control and reprogramming cardiac fibroblasts using the top 400 PCA genes, rather than using SLICER’s gene selection approach. We performed nonlinear dimensionality reduction using a technique called local linear embedding (LLE), which is analogous to a nonlinear version of PCA. Here, we used a three-dimensional LLE projection for trajectory construction. We then build a k-nearest neighbour graph in the low-dimensional manifold space produced by LLE. Shortest paths through the neighbour graph correspond to geodesics along the manifold, and we use the lengths of these shortest paths to order cells according to their distances from a user-defined starting cell. The steps of the reprogramming process can then be traced by examining the cells one-by-one in the specified ordering. We also investigated the distribution of cells along pseudotime, reasoning that local differences in density could indicate the relative speed of changes and stability of intermediate states. We estimated the density of cells in pseudotime using a Gaussian kernel density estimator, then calculated the free energy as max(density) – density. Non-proliferating M + G + T-transduced cells were used for free energy calculation (Fig. 1f).

Using a method similar to the previously described method of ref. 35, we used nonlinear regression to identify genes that are significantly related to the reprogramming process. Only non-proliferating cells were included in this analysis. For each gene with mean expression above 1, we fit a generalized additive model (GAM) of the Tobit family (VGAM R package). The GAM approach uses cubic splines to fit a smooth nonlinear model, and the Tobit likelihood accounts for zero inflation by modelling gene expression dropout as data censoring. To avoid overfitting the data, which would result in a curve that is too ‘wiggly’, we constrained the GAM fits to use three degrees of freedom. We then identified genes that were significantly related to the reprogramming process using a likelihood ratio test, with a constant GAM as the null model. Using k-medoid clustering (pam algorithm from the cluster R package), we identified clusters of significantly related genes that showed similar trends over the reprogramming process (Fig. 3a and Extended Data Fig. 7a).

Analysis of bulk RNA-seq and microarray data

Bulk RNA-seq data were analysed similar to single-cell data, except that they were only normalized for sequencing depth. Specifically, raw counts from the HTseq count were divided by the total number of mm10 mRNA reads from that sample and then multiplied by 1 × 106 to give counts per million. For differential expression analysis of LacZ versus MGT samples and MGT and shNT versus MGT and shPtbp1 samples, raw counts were inputted into DESeq2 (ref. 22) in R and lists of differentially expressed genes were obtained (FDR < 0.05, fold change >1.25). Heat maps were generated using the heatmap.2 function in the ‘gplots’ package in R. The microarray data were processed using the limma package of Bioconductor36. Raw data were first background-corrected and normalized using ‘normexp’ and ‘quantile’ methods, respectively. Next, control probes and low-intensity probes were filtered out using the 1.1 multiplier of the 95% quantile for negative controls as a cut-off. Lastly, probe intensity data were log2-transformed and replicated probes for each gene were averaged for subsequent analyses. PCAs were performed in R with the ‘prcomp’ function using all of the 34,378 detected genes and the 3D plot was generated with the ‘scatterplot3d’ package in R (Extended Data Fig. 4p, q).

Analysis of splicing

We aligned the bulk RNA-seq data (100-bp, paired-end reads) to mm10 using Mapsplice version 2.1.4. To detect alternative splicing, we used rMATS37 version 3.2.5 with Ensembl GRCm38.82 gene annotations and the novelSS (novel splicing site) flag to identify unannotated splicing events. All other rMATS parameters were set to the default values. In Fig. 3j–l, n, o and Extended Data Fig. 9a–i, we used FDR < 0.05 and ΔPSI > 15 as cut-offs, resulting in 1,494 alternative splicing events for MGT and shPtbp1 versus MGT and shNT, and 879 alternative splicing events for MGT versus LacZ (see Supplementary Tables 4, 5). In Fig. 3j, k, to determine whether the direction (sign) of PSI change is consistent between two group pairs, we identified the overlapping alternative splicing events between the samples (69 overlapping events between MGT versus LacZ and cardiomyocytes versus cardiac fibroblasts, and 155 events between MGT and shPtbp1 versus MGT and shNT, and cardiomyocytes versus cardiac fibroblasts). We then conducted a binomial test by first transforming the paired ΔPSI data into either +1 or −1 based on if their signs agreed or not, and then calculating the proportion of +1’s and compare it to 50% using a one-sided binomial test. The results for ΔPSI (cardiomyocyte–cardiac fibroblasts) versus (MGT–LacZ) showed that for only 34.78%, the signs in these two groups agree with each other (P = 0.0077). Therefore, we conclude that there was enough statistical evidence to support that the directions (signs) in cardiomyocyte–cardiac fibroblasts and MGT–LacZ are different. The result for cardiomyocyte–cardiac fibroblasts versus shPtbp1–shNT shows that for 83.22% the signs in two groups are the same (P = 2.2 × 10−16). Therefore we conclude that there is enough statistical evidence to support that the directions (signs) in cardiomyocyte–cardiac fibroblast and shPtbp1-shNT are the same. To plot the positional distribution of Ptbp1-binding motifs, we used rMAPS21 version 1.0.5. The rMAPS tool can only be used for exon-skipping events and has a database of known binding motifs for RNA-binding proteins, including Ptbp1. It considers exon-skipping events with FDR < 0.05 and ΔPSI > 5 as statistically significant and all others insignificant. Then it takes all exon-skipping events (both significant and insignificant) and uses the events that are not significant to create a background profile. We basically extracted the exon skipping events from the rMATS comparison of MGT and shPtbp1 versus MGT and shNT and ran rMAPS on this list to generate Fig. 3m.

Proliferation assays

Lentiviruses were packaged by transfecting HEK293T cells with Lipofectamine 2000 as previously described12. For packaging shRNA viruses, a total of 10 μg plasmids consisting of equivalent concentrations of each of the four or five shRNA targeting different regions of the gene were used. Lentiviral Mef2c, Gata4, Tbx5, inducible MGT (iMGT) and large T antigen were added to cells at 5 μl of each virus (if co-transduction) per cm2 of surface area and all other lentiviruses (rTtA, LacZ, cell-cycle related genes and shRNA) were used at 2.5 μl per cm2. For Extended Data Fig. 5b–m, pMXs-puro-MGT was used for iCM induction. In the EdU-incorporation assay, cells were pulsed with 10 μM of EdU for three days before staining with Alexa Fluor 647-labelled EdU of the Click-iT Plus Edu Alexa Fluor 647 Flow Cytometry Assay Kit (ThermoFisher Scientific, C10634). Propidium iodide (Life Technologies, P3566) staining was performed as previously described19. For iMGT induction, doxycycline was added at 1 μg ml−1 and the medium was changed every 2–4 days. Puromycin (puro) selection was performed at 1 μg ml−1 and the medium was changed every 2–4 days. For the cell-cycle synchronization assay, cardiac fibroblasts were treated with 400 ng ml−1 of nocodazole (G2/M) or cultured in a low serum condition (0.5% FBS, G0/G1) for four days before iCM induction; the normal serum condition was 10% FBS. To generate CF-T cells, neonatal cardiac fibroblasts were transduced with pBabe-largeT and selected using 600 ng ml−1 Zeocin in fibroblast medium (IMDM, 20% FBS, 1× penicillin–streptomycin) from day 2 for two weeks. The resulting CF-T cells were a relatively homogenous pool of cells after antibiotic selection. All data are representative of multiple repeated experiments.

shRNA library screen, immunofluorescence staining, flow cytometry and qRT–PCR

For screening of the shRNA library targeting splicing factors, MGT expression was induced by 1 μg ml−1 doxycycline in icMEFs19 that constitutively express Tet and the MGT construct under the control of a Tet-ON promoter. Lentiviruses containing mixed clones of shRNAs targeting each gene were added to the cells at 5 μl per cm2. For Ptbp1 protein expression, western blotting was performed as previously described14 (anti-Ptbp1, Cell Signaling 8776, 1:500). Adult cardiac fibroblasts (AdCF) and tail-tip fibroblast (AdTTF) were isolated using the explant method as previously described15. Clone 271 was used for all shPtbp1-related experiments, except for the initial screen, whereas mixed clones of shCd200 viruses were used for shCd200-related experiments. Information on shRNAs are listed in Supplementary Table 8.

Immunofluorescence staining and flow cytometry were performed as previously described12. Primary antibodies were used at the following dilutions: rabbit anti-GFP (Invitrogen, A11122, 1:500), chicken anti-GFP (Abcam, ab13970, 1:1,500), anti-α-SMA (Sigma, A2547, 1:200), anti-SM22α (Abcam, ab14106, 1:200), anti-α-actinin (Sigma, A7811, 1:500), anti-Cx43 (Sigma, C6219, 1:200), APC-Thy1.2 (eBioscience, 17-0902-81, 1:100) and APC-Cd200 (Biolegend, 123809, 1:200). Images were captured using an EVOS FL Auto Cell Imaging System (Life Technologies). All images shown in this study were overlaid with Hoechst nuclear staining, except for live images. For quantification, 10–30 images from multiple repeated experiments were randomly taken at 10×, 20× or 40× magnification at the same exposure and then counted in a blinded way. For Extended Data Fig. 6g–i, neonatal hearts were minced into small pieces and plated with fibroblast medium (the explant method14). After seven days of migration, the adherent cells were either immunostained in situ (Cy3-α-SMA, Sigma, C6198, 1:500; APC-Thy1.1, eBioscience, 17-0900-82, 1:100; PE-CD31, Biolegend, 102408, 1:200), or trypsinized, filtered through 40-μm cell strainers (Thermo Scientific), immunostained for Thy1.2 and α-SMA/CD31, and then analysed by flow cytometry. All flow cytometry data were collected on a Beckman Coulter Cyan ADP flow cytometer (UNC Flow Cytometry Core Facility) and analysed with the FlowJo software (Tree Star). qRT–PCR was performed as previously described12 (see Supplementary Table 8 for primer sequences). All data are representative of multiple repeated experiments.


Unless otherwise stated, values are expressed as mean ± standard deviation (s.d.) or standard mean of error (s.e.m.) of multiple biologically independent samples. Statistical tests performed include Student’s t-test, one way ANOVA followed by post hoc correction, linear regression, Spearman correlation, Kolmogorov–Smirnov test, binomial test, likelihood ratio test and χ2 test. Application and results of these tests are described in detail in Methods and figure legends. Generally, *P < 0.05 was considered statistically significant, **P < 0.01 was considered highly significant and *P < 0.001 was considered very highly significant. All data are representative of multiple repeated experiments.

Data availability

The RNA-seq data that support the findings of this study are available in the Gene Expression Omnibus (GEO) under the accession number GSE98571. Source Data for all figures are available in the online version of the paper.


Primary accessions

Gene Expression Omnibus


  1. 1.

    et al. Direct reprogramming of fibroblasts into functional cardiomyocytes by defined factors. Cell 142, 375–386 (2010)

  2. 2.

    et al. microRNA-mediated in vitro and in vivo direct reprogramming of cardiac fibroblasts to cardiomyocytes. Circ. Res. 110, 1465–1473 (2012)

  3. 3.

    et al. In vivo reprogramming of murine cardiac fibroblasts into induced cardiomyocytes. Nature 485, 593–598 (2012)

  4. 4.

    et al. Heart repair by reprogramming non-myocytes with cardiac transcription factors. Nature 485, 599–604 (2012)

  5. 5.

    , , , & Demethylation of H3K27 is essential for the induction of direct cardiac reprogramming by miR combo. Circ. Res. 120, 1403–1413 (2017)

  6. 6.

    , , , & In vivo cardiac reprogramming using an optimal single polycistronic construct. Cardiovasc. Res. 108, 217–219 (2015)

  7. 7.

    et al. In vivo cardiac cellular reprogramming efficacy is enhanced by angiogenic preconditioning of the infarcted myocardium with vascular endothelial growth factor. J. Am. Heart Assoc. 1, e005652 (2012)

  8. 8.

    et al. Chemical enhancement of in vitro and in vivo direct cardiac reprogramming. Circulation 135, 978–995 (2017)

  9. 9.

    et al. High-efficiency reprogramming of fibroblasts into cardiomyocytes requires suppression of pro-fibrotic signalling. Nat. Commun. 6, 8243 (2015)

  10. 10.

    , , , & Akt1/protein kinase B enhances transcriptional reprogramming of fibroblasts to functional cardiomyocytes. Proc. Natl Acad. Sci. USA 112, 11864–11869 (2015)

  11. 11.

    , , & Inhibition of TGFβ signaling increases direct conversion of fibroblasts to induced cardiomyocytes. PLoS ONE 9, e89678 (2014)

  12. 12.

    et al. Re-patterning of H3K27me3, H3K4me3 and DNA methylation during fibroblast conversion into induced cardiomyocytes. Stem Cell Res. 16, 507–518 (2016)

  13. 13.

    et al. MiR-133 promotes cardiac reprogramming by directly repressing Snai1 and silencing fibroblast signatures. EMBO J. 33, 1565–1581 (2014)

  14. 14.

    et al. Stoichiometry of Gata4, Mef2c, and Tbx5 influences the efficiency and quality of induced cardiac myocyte reprogramming. Circ. Res. 116, 237–244 (2015)

  15. 15.

    et al. Bmi1 is a key epigenetic barrier to direct cardiac reprogramming. Cell Stem Cell 18, 382–395 (2016)

  16. 16.

    et al. Single-cell RNA-Seq with waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 17, 360–372 (2015)

  17. 17.

    , & SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data. Genome Biol. 17, 106 (2016)

  18. 18.

    et al. A molecular roadmap of reprogramming somatic cells into iPS cells. Cell 151, 1617–1632 (2012)

  19. 19.

    et al. Generation of an inducible fibroblast cell line for studying direct cardiac reprogramming. Genesis 54, 398–406 (2016)

  20. 20.

    & Context-dependent control of alternative splicing by RNA-binding proteins. Nat. Rev. Genet. 15, 689–701 (2014)

  21. 21.

    , , , & rMAPS: RNA map analysis and plotting server for alternative exon regulation. Nucleic Acids Res. 44, W333–W338 (2016)

  22. 22.

    , & Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014)

  23. 23.

    et al. Improved generation of induced cardiomyocytes using a polycistronic construct expressing optimal ratio of Gata4, Mef2c and Tbx5. J. Vis. Exp. 105, e53426 (2015)

  24. 24.

    , , & Co-occupancy by multiple cardiac transcription factors identifies transcriptional enhancers active in heart. Proc. Natl Acad. Sci. USA 108, 5632–5637 (2011)

  25. 25.

    et al. The cardiac TBX5 interactome reveals a chromatin remodeling network essential for cardiac septation. Dev. Cell 36, 262–275 (2016)

  26. 26.

    et al. Prolyl hydroxylation by EglN2 destabilizes FOXO3a by blocking its interaction with the USP9x deubiquitinase. Genes Dev. 28, 1429–1444 (2014)

  27. 27.

    , , , & Reprogramming of mouse fibroblasts into cardiomyocyte-like cells in vitro. Nat. Protoc. 8, 1204–1215 (2013)

  28. 28.

    et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat. Methods 11, 41–46 (2014)

  29. 29.

    et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371–375 (2014)

  30. 30.

    , & HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015)

  31. 31.

    et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013)

  32. 32.

    , & Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007)

  33. 33.

    et al. Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS ONE 6, e17238 (2011)

  34. 34.

    et al. MBNL proteins repress ES-cell-specific alternative splicing and reprogramming. Nature 498, 241–245 (2013)

  35. 35.

    et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014)

  36. 36.

    et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015)

  37. 37.

    et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc. Natl Acad. Sci. USA 111, E5593–E5601 (2014)

  38. 38.

    et al. An abundant tissue macrophage population in the adult murine heart with a distinct alternatively-activated macrophage profile. PLoS ONE 7, e36814 (2012)

  39. 39.

    et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell 13, 1977–2000 (2002)

  40. 40.

    & Tropomyosin exons as models for alternative splicing. Adv. Exp. Med. Biol. 644, 27–42 (2008)

  41. 41.

    & Conserved developmental alternative splicing of muscleblind-like (MBNL) transcripts regulates MBNL localization and activity. RNA Biol. 7, 43–55 (2010)

  42. 42.

    , , & Translational control of tropomyosin expression in vertebrate hearts. Anat. Rec. (Hoboken) 297, 1585–1595 (2014)

Download references


We thank UNC AAC Core, HTSF Core, Flow Core for technical support. This study was supported by NIH HG06272 to J.F.P., NIH BD2K Fellowship (T32 CA201159) and NIH F31 Fellowship (HG008912) to J.D.W., NIH/NHLBI R00 HL109079 and American Heart Association (AHA) 15GRNT25530005 to J.L., AHA 13SDG17060010, Ellison Medical Foundation (EMF) AG-NS-1064-13, and NIH/NHLBI R01HL128331 to L.Q., and gifts from H. McAllister and C. Sewell.

Author information

Author notes

    • Ziqing Liu
    • , Li Wang
    •  & Joshua D. Welch

    These authors contributed equally to this work.


  1. McAllister Heart Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.

    • Ziqing Liu
    • , Li Wang
    • , Hong Ma
    • , Yang Zhou
    • , Haley Ruth Vaseghi
    • , Shuo Yu
    • , Joseph Blake Wall
    • , Sahar Alimohamadi
    • , Michael Zheng
    • , Chaoying Yin
    • , Jiandong Liu
    •  & Li Qian
  2. Department of Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA

    • Ziqing Liu
    • , Li Wang
    • , Hong Ma
    • , Yang Zhou
    • , Haley Ruth Vaseghi
    • , Shuo Yu
    • , Joseph Blake Wall
    • , Sahar Alimohamadi
    • , Michael Zheng
    • , Chaoying Yin
    • , Jiandong Liu
    •  & Li Qian
  3. Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA

    • Joshua D. Welch
    •  & Jan F. Prins
  4. Department of Statistics, University of California at Irvine, Irvine, California 92697, USA

    • Weining Shen


  1. Search for Ziqing Liu in:

  2. Search for Li Wang in:

  3. Search for Joshua D. Welch in:

  4. Search for Hong Ma in:

  5. Search for Yang Zhou in:

  6. Search for Haley Ruth Vaseghi in:

  7. Search for Shuo Yu in:

  8. Search for Joseph Blake Wall in:

  9. Search for Sahar Alimohamadi in:

  10. Search for Michael Zheng in:

  11. Search for Chaoying Yin in:

  12. Search for Weining Shen in:

  13. Search for Jan F. Prins in:

  14. Search for Jiandong Liu in:

  15. Search for Li Qian in:


L.Q., Z.L. and L.W. conceived and designed the study. Z.L. and L.W. designed and performed single-cell RNA-seq. Z.L., L.W., Y.Z. and C.Y. prepared samples for microarray and bulk RNA-seq. L.W., Y.Z., H.M., H.R.V., C.Y., S.Y., J.B.W., S.A. and M.Z. performed other experiments. Z.L., J.D.W. and J.F.P. performed data analysis and modelling. W.S. helped with statistical analysis. Z.L, J.D.W., J.L. and L.Q. wrote the manuscript, with extensive input from all authors. J.L. and L.Q. provided funding and overall supervision.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Jiandong Liu or Li Qian.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Discussions 1-3 and Supplementary Figure 1, the raw images of the western blots.

  2. 2.

    Life Sciences Reporting Summary

Excel files

  1. 1.

    Supplementary Tables

    This file contains Supplementary Tables 1-8.

About this article

Publication history







By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.