Polysome profiling followed by RNA-seq of cardiac differentiation stages in hESCs

The regulation of gene expression acts at numerous complementary levels to control and refine protein abundance. The analysis of mRNAs associated with polysomes, called polysome profiling, has been used to investigate the post-transcriptional mechanisms that are involved in different biological processes. Pluripotent stem cells are able to differentiate into a variety of cell lineages, and the cell commitment progression is carefully orchestrated. Genome-wide expression profiling has provided the possibility to investigate transcriptional changes during cardiomyogenesis; however, a more accurate study regarding post-transcriptional regulation is required. In the present work, we isolated and high-throughput sequenced ribosome-free and polysome-bound RNAs from NKX2-5eGFP/w HES3 undifferentiated pluripotent stem cells at the subsequent differentiation stages of cardiomyogenesis: embryoid body aggregation, mesoderm, cardiac progenitor and cardiomyocyte. The expression of developmental markers was followed by flow cytometry, and quality analyses were performed as technical controls to ensure high quality data. Our dataset provides valuable information about hESC cardiac differentiation and can be used to investigate genes potentially controlled by post-transcriptional mechanisms.


Background & Summary
Gene expression is controlled by a series of mechanisms, finally leading to protein formation. Continued findings regarding new regulatory mechanisms of gene expression are due to an increased understanding of RNA 1,2 and more integrative analysis tools 3,4 . The post-transcriptional level of regulation includes transcript synthesis, 5 0 capping, splicing, polyadenylation, nuclear export, translation and decay 5 . Translation variants have already been shown as crucial determinants of mammalian gene expression 6,7 , but genome-wide expression profiling is not able to detect the fine adjustment provided by posttranscriptional mechanisms. To overcome this issue, the polysome profiling technique has been used to isolate and further independently analyze ribosome-free and polysome-bound RNAs. RNAs associated with many ribosomes, called polysomes, form large complexes of high molecular weight 8 and can be easily segregated from ribosome-free RNAs through a sucrose gradient [9][10][11][12] .
Protein synthesis control pathways and post-transcriptional mechanisms involved in cell fate commitment are still being established [13][14][15] . Cardiac tissue formation occurs through precise activation of specific sequential genetic programs to drive cells to differentiation. During embryonic development, cardiomyocytes are derived from the cardiogenic mesoderm through modulation of many pathways and signaling molecules, including bone morphogenetic proteins (BMP), fibroblast growth factors (FGF), NODAL, and canonical and non-canonical Wnt (reviewed by 16 ). Moreover, the functional interconnection between transcription factors, their gene targets and signaling pathways delineates cardiomyogenesis and is evolutionarily conserved 17 . The use of isolated human embryonic stem cells (hESC) and induced pluripotent stem cells (iPSC) to derive specific cell lineages in vitro raised the possibility of artificially reproducing and studying this differentiation process. Modifications of signaling pathways were used to differentiate hESC to cardiomyocytes [18][19][20][21] , which are potential resources for cell therapy and can be used as tools for developmental studies, investigation of endogenous regenerative promotion and cardiac toxicity assays 18,22,23 . However, there is still a lack of detailed information about the complex gene regulatory network that controls cardiac commitment. Unveiling key regulatory elements and molecular signatures of the intermediate differentiation stages can further our current understanding of human cardiac development and produce, select and identify suitable cells for a range of different applications 24 .
Here, we describe the polysome profiling during the developmental steps of cardiomyogenic commitment. Ribosome-free and polysome-bound mRNAs were isolated and sequenced on D0, D1, D4, D9 and D15, which represents pluripotency, embryoid body (EB) aggregation, cardiac mesoderm, cardiac progenitor and cardiomyocyte stages, respectively (Fig. 1b). Three independent experiments were prepared using 2 to 6 million cells on each time-point mentioned, and technical controls for each analyzed sample and experimental stage were done to ensure high quality data. An overview of the study design is illustrated in Fig. 1a. Our dataset provides valuable information regarding hESC cardiac differentiation and can be used to investigate genes potentially controlled by post-transcriptional mechanisms. Moreover, these data are a powerful tool to explore new elements involved in cardiac cell fate commitment and contributes to the development of novel therapy and research approaches.

Cardiomyogenic differentiation of hESCs
A cardiac differentiation protocol was adapted from a previously described source in 18 and consists of 3 steps: embryoid body (EB) formation, mesoderm induction and cardiac progenitor induction. Initially, 7 × 10 5 cells/well were plated on Growth Factor Reduced Matrigel Matrix (Corning) 6-well coated dishes and maintained for 72 h in a humidified incubator with 5% CO 2 at 37°C. At day 0 (D0) of protocol, hESCs were incubated with collagenase I (1 mg/mL) for 20 min at 37°C, followed by trypsin-EDTA (0.05%) for approximately 1 min. Immediately after, trypsin was carefully removed, and a medium containing 50% of fetal bovine serum (FBS) and DNAse I (20 U/mL, Invitrogen) was added to the plate. Cells were detached with a cell scraper to avoid single-cell detachment and centrifuged at 230 × g for 5 min. After removal of the supernatant, a basal medium composed of StemPro34 (StemPro™-34 SFM, Gibco™), 100 U/mL penicillin, 100 μg/mL streptomycin, 2 mM L-Glutamine, 150 μg/mL transferrin, 50 μg/mL ascorbic acid and 0.45 mM monothioglycerol (MTG) was supplemented with 1 ng/mL BMP4 (R&D systems, cat. 314-BP) and added gently. The cell pellet was resuspended to form small clusters of 10-20 cells, which were seeded into ultra-low attachment 6-well culture plates (Corning ® Costar ® Ultra-Low Attachment plate) and kept in a humid incubator at 37°C, 5% CO 2 and 5% O 2 (hypoxia) for EB aggregation for 24 h. At day 1 (D1), EBs were collected and decanted in a round bottom plastic tube for 30 min. After this period, the supernatant was gently removed, and EBs were resuspended in basal medium supplemented with 10 ng/mL BMP4, 6 ng/mL Activin A (R&D systems, cat. 338-AC) and www.nature.com/sdata/ SCIENTIFIC DATA | 5:180287 | DOI: 10.1038/sdata.2018.287 5 ng/mL FGF2 (R&D systems, cat. 233-FB) to induce mesoderm specification. After 72 h, on day 4 (D4), the medium was replaced with basal medium supplemented with XAV939 (10 μM/mL) (Tocris, cat. 3748) and VEGF (10 ng/mL) (R&D systems, cat. 293-VE) to induce cells into cardiac progenitors. On days 8 and 11, the medium was replaced with basal medium supplemented with VEGF (10 ng/mL) and BMP4 (1 ng/mL). The cells were kept in a humid incubator at 37°C, 5% CO 2 and 5% O 2 during all the procedure. Three independent differentiation assays were used as experimental replicates. As a control of cardiomyogenic differentiation, hESC were submitted to the same processing without adding any induction factor (non induced differentiation).

RNA isolation and quality control
Ribosome-free (fractions 1-3) and polysomal (fractions 10-22) fractions were pooled, and RNA was isolated using the Direct-zol RNA MiniPrep (Zymo Research), following the manufacturer's instructions. Quality and quantity of RNA were determined using RNA 6000 Pico (for ribosome-free) and Nano (for polysome-bound) kits and an Agilent 2100 Bioanalyzer (Agilent Technologies) and compared to reference samples. Ribosome-free RNA samples ranged from 0.2 to 3.4 μg and polysome-bound from 5 to 16 μg.

High-Throughput sequencing and data analysis
A total of 30 samples were prepared for sequencing ( Table 1). The cDNA libraries were prepared with 200-500 ng of ribosome-free or 2 μg of polysome-bound RNA using the TruSeq Stranded mRNA Sample Preparation kit (Illumina, Inc.). Quality and quantity of cDNA libraries were determined using the DNA 1000 kit, Agilent 2100 Bioanalyzer and KAPA Library Quantification qPCR (KAPA Biosystems). RNAseq was carried out in an Illumina HiSeq 2500 platform, and raw data quality control was generated using FastQC Reports (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Mapping and counting of sequencing data were performed with Rsubread package 26 against the new version of the human genome GRCh38. Mapping was done with default parameters and set for unique mapping of the reads. Counting was performed using the annotation of Ensembl (GRCh38). For comparisons of gene expression within and between samples, RPKM values (reads per kilobase per million mapped reads, an expression measure) were determined. For quality check purposes, we performed a PCA analysis, a dimension reduction method of the matrix of counts, to explore associations between variables. Samples of the same condition should cluster together in order to ensure consistency and replicability of results.

Code availability
R code for data analysis is available upon request. The R version used for this study was 3.3.2. R-packages Rsubread and edgeR were used with versions 1.24.2 and 3.16.5, respectively.

cDNA preparation and quantitative PCR (qPCR)
Total RNA was extracted using TRIzol ® Reagent (Invitrogen), and cDNA synthesis was performed with ImProm-II™ Reverse Transcription System (Promega), following manufacturer's instructions. Quantitative analysis of transcripts was performed using SYBR ® Green (Applied Biosystems) and LightCycler ® 96 (Roche) equipment. The primers that were used are shown in Table 2, and for each reaction, 5 pmol of primer and 25 ng of cDNA were used. All samples were evaluated in triplicate.

Data Records
Flow cytometry data generated during this study were submitted to the FlowRepository (Data Citation 1). FCS files related to each replicate and cardiac differentiation time evaluated (day 3, day 4, day 9 and day 15) are available.
RNA-seq data related to this study were submitted to the NCBI repository SRA (Data Citation 2). Raw RNA-seq data (paired-end fastq files) as generated by Illumina Hiseq 2500 (Data Citation 2). This site serves as a landing page for the study: description of the project, metadata and raw sequencing files can be found there. Individual accession numbers for each biological sample are also provided in Table 1. Counts data and RPKM can be found in file table_genes_counts.xlsx (Data Citation 3). One tab corresponds to the read counts of each sample, and the other, to RPKM values. Each column of each file

Technical Validation Cardiomyogenic differentiation
The NKX2-5 eGFP/w HES3 cell lineage is a reporter human embryonic stem cell (hESC) line that can be used to derive cardiomyocytes and follow the differentiation through eGFP expression 25 . Here, we used a developmentally staged protocol 18,27 to induce a cardiac mesoderm population on days 3 and 4 and a NKX2-5 + /cTNT + population by day 15 (Fig. 1b). Beating clusters were observed after 10 days of differentiation (Online video I, Data Citation 3), and cardiomyogenic cells were seen by NKX2-5/eGFP expression (Fig. 1c). Immunostaining using cTNI showed the striations characteristic of sarcomere structures on day 20 of differentiation, as representative control of differentiation protocol (Fig. 1d). To follow the differentiation progress, we established two checkpoints during the cardiomyogenesis protocol: (1) presence of CD56+ cells on days 3-4, which corresponds to mesoderm specification 28 , and (2) NKX2-5/GFP+ cells on day 9, meaning cardiac progenitor commitment. Moreover, cTnT expression was also determined on day 15 and considered proportional to the efficiency of differentiation. Those markers were followed by flow cytometry in all replicate experiments (n = 3) (Fig. 2 and Data Citation 1). Samples used for data acquisition yielded 80.17 ± 7.9% of CD56 + on D4, 58.37 ± 7.1% of eGFP + on D9 and on D15 75.67 ± 3.8% of eGFP + and 56.23 ± 4.5% cTNT + cells (Table 1 and Fig. 2).

Polysome profiling
In order to increase the accuracy of the cardiomyogenesis translatome study, we chose to use the polysome profiling methodology to access the polysome-bound RNAs in distinct phases of cardiac differentiation. We performed polysome profiling on D0, D1, D4, D9 and D15 of the differentiation protocol, which represent pluripotency, EB aggregation, cardiac mesoderm, cardiac progenitor and cardiomyocyte stages, respectively (Fig. 1b). Differing densities within the sucrose gradient allowed for the isolation of ribosome-free and polysome-bound RNAs after ultracentrifugation (Fig. 3a). Several fractions were separated using the ISCO gradient fractionation system, while the polysome profiles were recorded using a UV detector (Fig. 3a). The polysome profile derived from each sample was used to determine the fractions that corresponded to ribosome-free (fractions 1-3) and polysome-bound RNAs (fraction 10-22), which were pooled and followed by RNA extraction. One representative image of polysome profile (D0 replicate 2) is shown on Fig. 3a.

RNA analysis, cDNA libraries and sequencing quality control
Isolated RNAs from ribosome-free and polysome-bound fractions were analyzed for quality and concentration to determine their suitability for RNA-sequencing using an Agilent 2100 BioAnalyzer. Figure 3b shows representative examples of quality results of ribosome-free and polysome-bound samples. Polysome-bound samples showed two distinct picks, which represent 18S and 28S ribosomal RNAs. Those peaks were not shown in ribosome-free samples, which was expected, given the absence of  ribosomes. On the other hand, a smaller peak corresponding to tRNAs was observed. All samples measured as high integrity and were considered of high quality to be used on RNA-sequencing. The cDNA libraries prepared with the TruSeq Stranded mRNA Sample Preparation kit were analyzed to determine quality and quantity using an Agilent 2100 Bioanalyzer (Fig. 3c). As examples, representative images generated by this analysis are shown for ribosome-free and polysome-bound  samples. Moreover, cDNA libraries were also quantified by a KAPA Library Quantification kit (data not shown), and these values were used to calculate the sequencing input samples. Raw data derived from sequencing were analyzed using FastQC to determine the quality of the reads by comparing read signals to the probability of accurate base-reading. All samples showed suitable scores, and a ribosome-free and polysome-bound representative analysis is shown (Fig. 3d).

Biological RNA-seq data
A brief sample description is illustrated in Table 3. Three experimental replicates were done for each differentiation time-point and derived ribosome-free and polysome-bound RNA samples. All sequencing data were deposited at the SRA repository (NCBI) (Data Citation 2). Samples were grouped according to type of RNA fraction (ribosome-free vs. polysome-bound) using principal component analysis (PCA) (Fig. 4a) and according to day of differentiation (D0, D1, D4, D9 and D15) (Fig. 4b, c), indicating the reproducibility of biological replicates. Additionally, RPKM values of developmental markers were plotted on a heatmap (Fig. 4d) to show the specificity of mesoderm commitment among the three germ layers. RPKM values of cardiac markers were also plotted (Fig. 4e) to show the higher cardiomyocyte marker expression when compared to markers of endothelial (EC) and smooth muscle (SMC) cells, other cardiac progenitor derivatives.

qPCR validation
To identify if our RNA-seq data were compatible with cardiomyogenesis gene expression, we prepared total RNA samples from the same differentiation time-points for qPCR analysis of developmental and cardiac marker expression. Comparing the log2 fold change of hESC on day 0 (D0), we demonstrated the similarity between RNA-seq and qPCR results. POU5F1 (OCT4) and NANOG are transcription factors expressed in pluripotency conditions which compose the pluripotency core regulatory circuitry 29 . These genes represent markers for the pluripotent state and showed a gradual decrease in expression throughout differentiation in our RNA-seq and qPCR results (Fig. 4f). Developmental markers were also analyzed, as the T-box Brachyury/T, which has a conserved role in mesoderm differentiation 30 , and Eomesodermin (EOMES), which expression marks the earliest cardiac mesoderm and promotes formation of cardiovascular progenitors 31 . The mesodermal markers T and EOMES showed increased expression on D4 (mesoderm stage) and were down-regulated on D9 and D15. Finally, the expression of cardiac-related genes such as GATA4, NKX2-5 and TNNT2 was increased during differentiation (Fig. 4f).