Background & Summary

Somatic reprogramming is a common method to manipulate cell lineage through epigenetic modification and induce somatic cells into a close embryonic stage triggered by various pluripotency master transcription factors, such as OCT4, SOX2, KLF4, c-Myc, or NANOG1,2,3,4. This somatic epigenetic manipulation has resulted in the generation of personalised induced pluripotent stem cells (iPSCs), which provide tremendous implications in regenerative medicine. During iPSCs generation, epigenetic regulation leads to differential patterns of gene expression through alterations in chromatin structure and modifications of the DNA while still sharing the same genomic sequence as its somatic cells5,6,7. Moreover, iPSCs retain epigenetic marks from their somatic source, known as “epigenetic memory”, which affects their downstream differentiation ability and inclines the differentiation to their original source8,9,10,11.

MiRNAs, a group of small non-coding RNAs with ~22-nt in length, were reported with solid evidence to maintain or manipulate cell lineage, which may be attributed to their ability to control factors involved in cell fate determination or epigenetic regulation12,13,14,15,16. For instance, miR-302 has been identified as a well-known gene silencer in reprogramming somatic cells into iPSCs. The miRNA function induces global DNA demethylation by repressing the expression of multiple key epigenetic regulators, such as DNMT1, MECP1/2, and HDAC2/412,17,18. The miR-290 family, called embryonic stem (ES) cell-specific cell cycle regulating miRNAs, was validated to maintain the rapid proliferative state of ES cells by regulating the G1-S phase transition. Moreover, miR-9 and miR-124a, which are predominantly expressed in neurons, have been demonstrated to regulate the formation and proliferation of the neural lineage derived from ES cells based on control of STAT3 phosphorylation19. Interestingly, miRNAs can be regulated by various epigenetic modifications, including DNA methylation, RNA modification, and histone modifications, which further exerts extensive influence on gene expression profile20,21,22,23,24. Dysregulation of the miRNA-epigenetic feedback loop has been validated to interfere with the physiological and pathological processes, but its specific role in cell fate determination of iPSCs remains poor understood.

Exosomes, one of the smallest extracellular vesicles (EVs) secreted in various cell types, act as bioactive vesicles in cell-to-cell communication by carrying proteins, miRNAs and other factors25,26. In the cell microenvironment, exosomal miRNA can be taken up by neighbour cells or distant cells and subsequently regulate the epigenetics of recipient cells. It was reported that exosomal miRNAs contribute significantly to the maintenance of pluripotency or other specific cell fate in their niche27,28,29,30. Notably, it was unveiled that about 70% of the miRNAs identified in iPSCs were also present in iPSC- EVs28, which indicates miRNAs were efficiently transferred from iPSCs to EVs for regulating pluripotent signalling. While the variate of exosomal miRNAs during reprogramming was limited to investigation, their roles in regulating cell fate need to be further studied.

To further understand the differentiation bias originating from somatic cells and the role of exosomal miRNA in regulating epigenetic heterogeneity during the generation of human iPSCs (hiPSCs), we simultaneously collected transcriptome data sets from the three most common somatic cell sources: skin fibroblasts (SFs), peripheral blood mononuclear cells (PBMCs), and urine cells (UCs), along with their derived iPSCs (Fig. 1). In order to minimise the biological variation, we recruited three healthy male donors within a similar age group (25–30 years old) and from the same genetic population (southern Han nationality represents about half of the Chinese population, approximately 10% of the world’s population31,32). Comparative data were generated before and after reprogramming, resulting in 17 sets of cellular RNA-Seq data and 18 sets of exosome-derived small RNA sequencing data. Subsequently, an in-house developed workflow was implemented to analyse the comparative transcriptomics data, including quality validation, differential expression analysis and gene set enrichment analysis. Our work provides a valuable resource for future investigations into donor cell variation in functional epigenetic regulation and differentiation bias in regenerative medicine.

Fig. 1
figure 1

Schematic workflow of this investigation. (a) Exosome miRNA and total mRNA were collected from three categories of somatic cell and their derived iPSCs, along with identifying hiPSCs characteristics. (b) An overview of the analysis flow of miRNA and mRNA data.


Ethical approval

All samples were collected following the guidelines established by the Human Subject Research Ethics Committee at Guangzhou Institute of Biomedicine and Health (GIBH), the Chinese Academy of Sciences (CAS). The experiments were approved by the ethical committee under the approval number GIBH-IRB07-2015083. Prior to sample donation, all volunteers who donated skin, urine or blood samples had been thoroughly informed about the content, purposes, possible risks, and benefits of the experiment through a consent form, and provided their permission for genetic material data to be shared.

Collecting and culturing the human primary somatic cells

PBMCs, SFs and UCs were isolated from three healthy males aged from 25 to 30 and satisfied specific criteria. These criteria included having a normal BMI value, no family history of genetic disease and major surgery, and not smoking and alcohol consumption. Additionally, the annual health examination reports of all three volunteers had been thoroughly reviewed, indicating their physical well-being: no chronic illnesses or infectious diseases, with the standard range for blood pressure, heart rate, standard levels for blood chemistry, including cholesterol, glucose, liver function, kidney function, and absence of any medical conditions that significantly affect bodily functions or absence of diagnosed mental disorders. Notably, all three volunteers belonged to the southern Han nationality, representing approximately half of the Chinese national population and around 10% of the global population. The cell culture conditions were referred to in the previous publications33,34.

Establishment of hiPSCs from different sources

HiPSCs derived from SFs and UCs were generated based on the method described in the previous study33. The reprogramming procedure of UCs can be found in our previous protocol34. Reprogramming of human PBMCs was conducted with minor modification based on a published study35. Precisely, co-transfection of two episomal plasmids (pEP4-EO2SET2K and pEP4-M2L) and a vector containing hmiR302 cluster was performed in human PBMCs using AmaxaTM Basic NucleofectorTM Kit (Lonza), then these PBMCs were seeded on 6-well cell culture plate.

Characterisation of the hiPSCs

The karyotypes of hiPSCs derived from three types of somatic cell sources were detected using G band techniques. The presence of inserted genes from the reprogramming plasmids were demonstrated by PCR and gel-imaging system, as described in our previous research33. The protocols of immunofluorescence and quantitative real-time PCR analysis were referred to our previous studies33,34. Approximately 1 × 106 iPSCs were suspended in 100 μl Matrigel (diluted by DMEM/F12 1:1) and subcutaneously injected into the back of NOD/SCID mice. After teratoma formation, tumours were stained with haematoxylin-eosin and observed using an Olympus IX73 microscope.

RNA extraction, library construction, and Illumina sequencing

Total RNA was isolated using TRIzol reagent (Thermo Fisher) following its standard protocols. RNA qualification, library construction and sequencing were performed as our previous publications33,34.

Exosomal miRNA extraction, library construction, and Illumina sequencing

Exosomes were isolated from the cell culture medium using exoRNeasy Serum/Plasma Maxi Kit (Qiagen) following the provided instructions. HiPSCs with passage numbers ranging from 21 to 28 were used to isolate the exosomes in our experiments. Exosomal miRNA was extracted utilizing the exoRNeasy Mini Kit (Qiagen). After quantifying and qualifying the RNA, 3 μg of total RNA was used for the construction of sequencing libraries through NEBNext Multiplex Small RNA Library Prep Set for Illumina (NEB, USA). The quality of each library was assessed by the Agilent Bioanalyzer 2100 system, and sequencing was done on an Illumina Hiseq 2500/2000 platform.

Preprocessing of RNA sequencing data

Raw sequencing data were processed by fastp v0.20.136 with default parameters to remove adapter sequences, low-quality reads and short-length sequences. The resulting clean reads were then mapped to the human genome hg38 to quantify global gene expression using the expectation-Maximization method implemented in RSEM v1.2.22. Gene count and transcripts per million (TPM) matrix information was obtained for each sample. After filtering low-expression genes with an average expression less than 1, the log2(CPM + 1) values of each sample were used for the principal component analysis (PCA) and the correlation coefficient calculation. Differentially expressed genes (DEGs) between somatic cells and their derived hiPSCs were determined by DESeq 2 v1.30.137. DEGs were identified with an adjusted P-value < 0.05 and the absolute value of log2FoldChange > 1. While TPM matrix information was used to evaluate the expression levels of differentiation and pluripotency related genes. Gene set enrichment analysis was conducted by clusterProfiler v4.4.438 based on Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) databases.

Preprocessing of small RNA sequencing data

FastQC ( was utilized to verify the sequence quality by assessing parameters such as Q20, Q30, GC-content and adapter sequences. The raw data were mapped to the human genome hg38 and human miRNA sequences from miRBase39 v22 to predict novel miRNAs and evaluate the expression levels of known miRNAs by miRDeep40 v2.0.1.2 in multiple samples mode. The mapping number of each read in the human genome hg38 was constrained by a maximum of 5. Low-expression miRNAs with an average expression of less than 1 were filtered out, and then log2(CPM + 1) values of miRNA in each sample were used to perform the principal component analysis (PCA) and calculate the correlation coefficient. Differentially expressed miRNAs (DEMs) were identified using the same method as for DEGs. The miRNA targeted genes were analysed by multiMiR41 v1.12.0 based on its database v2.3, which incorporated three validated miRNA-target interactions databases (miRecoord, miRtarBase and TarBase) and 8 predicted miRNA-target interactions databases (DIANA-microT-CDS, RIMMo, MicroCosm, miRDB, PicTar, PITA and targetScan). The down-regulated DEGs were used to explore the up-regulated DEMs target genes, while the up-regulated DEGs were used to explore the down-regulated DEMs target genes. The targeted genes found in validated databases or in more than three predicted databases underwent GO and KEGG pathway enrichment, as described above.

Data Records

Our raw data, consisting of 17 RNA-seq and 18 exosomes’ small RNA-seq data sets, was stored in the Genome Sequence Archive42 in National Genomics Data Center (NGDC)43 with the accession number HRA00369744. The corresponding expression matrix information was deposited in NGDC of China National Center for Bioinformation with the accession number PRJCA01366245.

Technical Validation

The characteristics of hiPSCs

Karyotype analysis indicated that the chromosome profiles of hiPSCs derived from PBMCs, SFs and UCs were without abnormality (Fig. 2a). The exogenous episomal DNA (OCT4, SOX2, KLF4, MicroRNA302-367 or c-Myc) was absent in all three somatic cell-derived hiPSCs (Fig. 2b). Compared with their respective somatic cells, the mRNA expression levels of OCT4, SOX2 and NANOG were significantly increased in hiPSCs from three categories (Fig. 2c). Immunofluorescence result revealed higher expression levels of the pluripotent protein markers (OCT4, SSEA4, TRA-1-60, and TRA-1-81) in hiPSCs derived from the three somatic cell types compared to their respective somatic cells (Fig. 2d). The pluripotent potential of hiPSCs derived from the three types of somatic cells was further investigated by teratoma formation, which exhibited their ability to differentiate into three germ-layer (Fig. 2e). These results illustrated that our hiPSCs have similar pluripotent characteristics to human embryonic stem cells.

Fig. 2
figure 2

Characteristics of hiPSCs. (a) The karyotype of hiPSCs derived from three somatic cell types with three samples. (b) The presence of exogenous episomal DNA in hiPSC was identified by agarose gel electrophoresis. Somatic cells transfected by episomal DNA served as the positive control, while the H1 cell line and somatic cells served as the negative controls; GAPDH served as the internal reference. B: PBMCs, F: SFs, U: UCs, i: hiPSCs. The specimens Fi1, Bi2 and Ui3 were chosen to show the results. (c) The expression levels of NANOG OCT4, and SOX2 in somatic cells, their derived hiPSCs and H1 cells were evaluated by qRT-PCR. The gene expression level in each sample was detected triple times, and its mean expression was used as its expression value. Each point represents a sample value. The P-value was calculated by Student’s t-test. *P < 0.05, **P < 0.01, ***P < 0.001. (d) The detection of OCT4, SOX2, SSEA4, TRA-1-60 and TRA-1-81 by immunostaining. scale bar: 200 um. (e) The histology of teratomas induced from hiPSCs derived from different somatic cells. The hiPSCs generated from sample 2 were chosen to show as an example. Teratomas were stained with H&E.

Quality control of RNA sequencing data

High-throughput RNA sequencing generated 1.2~2.0 × 107 raw reads per sample, with the Q20 > 0.95, Q30 > 0.90, GC-content close to 0.50, and the mean length of 149 bp for clean reads(Table 1). Most of the pair reads were aligned to the hg38 genome, with the sample align rate ranging from 66% to 88% and the unique mapping rate ranging from 15% to 22%, much lower than that of the multiple mapping rate (Table 2).

Table 1 RNA-seq data quality summary.
Table 2 Summary of RNA-seq reads mapping results.

Genes expression analysis

Gene expression analysis was performed using three biologically replicated samples, except that SFs-derived hiPSCs had only two samples. The correlation coefficients calculated based on Pearson correlation and PCA analysis implied good repeatability within the biological replicates and high similarity among hiPSCs derived from the three kinds of somatic cells (Fig. 3a,b). The gene expression levels of the pluripotency related genes (PPA2, GDF3, TERT, NANOG, ZFP42, FGF4, LIN28A, DPPA4, POU5F1, and SOX2) were significantly higher in hiPSCs than that in somatic cells, whereas the gene expression levels of differentiation related genes (NR2F2, ANPEP, and SOX17) were significantly lower (Fig. 3c). The analysis of differentially expressed genes (DEGs) between somatic cells and their derived hiPSCs indicated significant changes from a differentiated state to a pluripotency state. In these sets, a total of 10,203 genes, 10,026 genes and 9,330 genes were identified as DEGs (Table 3), respectively, including 4446 shared DEGs with 2279 consistently up-regulated DEGs and 1423 consistently down-regulated DEGs. (Fig. 3d). In addition, the top 20 pathways identified in the enrichment analysis of DEGs based on the biological process GO and KEGG indicated that immune-related pathways were mostly down-regulated in PBMCs-derived hiPSCs compared to PBMCs. Up-regulated synaptic signalling and down-regulated embryonic skeletal system development were both observed in the SFs and UCs categories (Fig. 3e,f).

Fig. 3
figure 3

Gene expression among somatic cells and hiPSCs. (a) Principal components analysis of different somatic cells and their derived hiPSCs. (b) Correltion analysis of different somatic cells and their derived hiPSCs. (c) The mRNA expression levels of the pluripotency and differentiation related genes. (d) Venn plot of DEGs among the three groups. Bips_B: PBMCs-derived hiPSCs Vs. PBMCs, Fips_F: SFs-derived hiPSCs Vs. SFs, Uips_U: UCs-derived hiPSCs Vs. UCs. (e,f) Gene set enrichment analysis of DEGs based on biological process GO and KEGG databases, respectively. The top 20 enriched pathways of each group were displayed.

Table 3 The number of differently expressed genes (DEGs) between hiPSCs and somatic cells.

Processing the small RNA sequencing data

A total of 2.83 × 107 raw reads were generated, ranging from 1.1 to 2.2 × 106 raw reads per sample. The Q20 rate was more than 0.99, the Q30 rate was more than 0.98, and the GC content of raw reads with a mean length of 50 bp was approximately 0.53 (Table 4). In summary, the small RNA-seq sequencing data have quite good quality. The mapping rate of small RNA reads to the hg38 genome per sample ranged from 19% to 62%, while the mapping rate of reads to human miRNA sequences ranged from 0.5% to 12%, with an average rate of 2.50% (Table 5). According to data GSE21655646 deposited on the NCBI GEO database, the mapping rates of small RNA sequences extracted from exosomes to human miRNA sequences ranged from 1.26% to 2.68%, indicating a lower alignment rate of miRNA from exosomes compared to the cellular datasets. The results of correlation coefficients and the PCA analysis implied good repeatability within the biological replicates, except for sample Us3, which displays a closer relationship to the PBMCs exosomes (Fig. 4a,b). The exosomal miRNA data generated from three categories of hiPSCs exhibited a high degree of similarity, vastly different from their somatic miRNA samples. A total of 248, 137 and 106 miRNAs were separately identified as differentially expressed miRNAs (DEMs) in these three groups, including shared 72 DEMs with 68 consistently up-regulated DEMs and 5 consistently down-regulated DEMs (Table 6, Figs. 3d, 4c). The items of DEGs targeted by DEMs were summarised in Table 7. The gene set enrichment analysis of targeted DEGs for each group based on biological process GO, and KEGG was conducted and revealed similar enriched pathways to DEGs (Table 8). The top 20 enriched pathways were shown in Fig. 4e,f.

Table 4 Quality summary of small RNA-seq data.
Table 5 Summary of small RNA-seq reads mapping results.
Fig. 4
figure 4

Expression of miRNA in exosomes from somatic cells and hiPSCs. (a) Principal components analysis applied to different somatic cells and their derived hiPSCs. (b) Correlation analysis of different somatic cells and their derived hiPSCs. (c) Venn plot of differentially expressed miRNA (DEMs) among the three groups, Bips_B: PBMCs-derived hiPSCs Vs. PBMCs, Fips_F: SFs-derived hiPSCs Vs. SFs, Uips_U: UCs-derived hiPSCs Vs. UCs. (d) The expression profile of the shared DEMs (e,f) Gene set enrichment analysis of DEGs targeted by DEMs based on GO and KEGG databases, respectively. The top 20 enriched pathways of each group were shown.

Table 6 The number of DEMs between hiPSCs and somatic cells.
Table 7 The number of DEGs targeted by DEMs.
Table 8 The number of enriched pathways of DEGs and their targeting DEGs based on biological process GO and KEGG.