Comparison of miRNA transcriptome of exosomes in three categories of somatic cells with derived iPSCs

Somatic cells can be reprogrammed into induced pluripotent stem cells (iPSCs) through epigenetic manipulation. While the essential role of miRNA in reprogramming and maintaining pluripotency is well studied, little is known about the functions of miRNA from exosomes in this context. To fill this research gap,we comprehensively obtained the 17 sets of cellular mRNA transcriptomic data with 3.93 × 1010 bp raw reads and 18 sets of exosomal miRNA transcriptomic data with 2.83 × 107 bp raw reads from three categories of human somatic cells: peripheral blood mononuclear cells (PBMCs), skin fibroblasts(SFs) and urine cells (UCs), along with their derived iPSCs. Additionally, differentially expressed molecules of each category were identified and used to perform gene set enrichment analysis. Our study provides sets of comparative transcriptomic data of cellular mRNA and exosomal miRNA from three categories of human tissue with three individual biological controls in studies of iPSCs generation, which will contribute to a better understanding of donor cell variation in functional epigenetic regulation and differentiation bias in iPSCs.


Background & Summary
Somatic reprogramming is a common method to manipulate cell lineage through epigenetic modification and induce somatic cells into a close embryonic stage triggered by various pluripotency master transcription factors, such as OCT4, SOX2, KLF4, c-Myc, or NANOG [1][2][3][4] .This somatic epigenetic manipulation has resulted in the generation of personalised induced pluripotent stem cells (iPSCs), which provide tremendous implications in regenerative medicine.During iPSCs generation, epigenetic regulation leads to differential patterns of gene expression through alterations in chromatin structure and modifications of the DNA while still sharing the same genomic sequence as its somatic cells [5][6][7] .Moreover, iPSCs retain epigenetic marks from their somatic source, known as "epigenetic memory", which affects their downstream differentiation ability and inclines the differentiation to their original source [8][9][10][11] .
MiRNAs, a group of small non-coding RNAs with ~22-nt in length, were reported with solid evidence to maintain or manipulate cell lineage, which may be attributed to their ability to control factors involved in cell fate determination or epigenetic regulation [12][13][14][15][16] .For instance, miR-302 has been identified as a well-known gene silencer in reprogramming somatic cells into iPSCs.The miRNA function induces global DNA demethylation by repressing the expression of multiple key epigenetic regulators, such as DNMT1, MECP1/2, and HDAC2/4 12,17,18 .The miR-290 family, called embryonic stem (ES) cell-specific cell cycle regulating miRNAs, was validated to maintain the rapid proliferative state of ES cells by regulating the G1-S phase transition.Moreover, miR-9 and miR-124a, which are predominantly expressed in neurons, have been demonstrated to regulate the formation and proliferation of the neural lineage derived from ES cells based on control of STAT3 phosphorylation 19 .Interestingly, miRNAs can be regulated by various epigenetic modifications, including DNA methylation, RNA modification, and histone modifications, which further exerts extensive influence on gene expression profile [20][21][22][23][24] .Dysregulation of the miRNA-epigenetic feedback loop has been validated to interfere with the physiological and pathological processes, but its specific role in cell fate determination of iPSCs remains poor understood.
Exosomes, one of the smallest extracellular vesicles (EVs) secreted in various cell types, act as bioactive vesicles in cell-to-cell communication by carrying proteins, miRNAs and other factors 25,26 .In the cell microenvironment, exosomal miRNA can be taken up by neighbour cells or distant cells and subsequently regulate the epigenetics of recipient cells.It was reported that exosomal miRNAs contribute significantly to the maintenance of pluripotency or other specific cell fate in their niche [27][28][29][30] .Notably, it was unveiled that about 70% of the miR-NAs identified in iPSCs were also present in iPSC-EVs 28 , which indicates miRNAs were efficiently transferred from iPSCs to EVs for regulating pluripotent signalling.While the variate of exosomal miRNAs during reprogramming was limited to investigation, their roles in regulating cell fate need to be further studied.
To further understand the differentiation bias originating from somatic cells and the role of exosomal miRNA in regulating epigenetic heterogeneity during the generation of human iPSCs (hiPSCs), we simultaneously collected transcriptome data sets from the three most common somatic cell sources: skin fibroblasts (SFs), peripheral blood mononuclear cells (PBMCs), and urine cells (UCs), along with their derived iPSCs (Fig. 1).In order to minimise the biological variation, we recruited three healthy male donors within a similar age group (25-30 years old) and from the same genetic population (southern Han nationality represents about half of the Chinese population, approximately 10% of the world's population 31,32 ).Comparative data were generated before and after reprogramming, resulting in 17 sets of cellular RNA-Seq data and 18 sets of exosome-derived small RNA sequencing data.Subsequently, an in-house developed workflow was implemented to analyse the comparative transcriptomics data, including quality validation, differential expression analysis and gene set enrichment analysis.Our work provides a valuable resource for future investigations into donor cell variation in functional epigenetic regulation and differentiation bias in regenerative medicine.

Ethical approval. All samples were collected following the guidelines established by the Human Subject
Research Ethics Committee at Guangzhou Institute of Biomedicine and Health (GIBH), the Chinese Academy of Sciences (CAS).The experiments were approved by the ethical committee under the approval number GIBH-IRB07-2015083.Prior to sample donation, all volunteers who donated skin, urine or blood samples had been thoroughly informed about the content, purposes, possible risks, and benefits of the experiment through a consent form, and provided their permission for genetic material data to be shared.
Collecting and culturing the human primary somatic cells.PBMCs, SFs and UCs were isolated from three healthy males aged from 25 to 30 and satisfied specific criteria.These criteria included having a normal BMI value, no family history of genetic disease and major surgery, and not smoking and alcohol consumption.Additionally, the annual health examination reports of all three volunteers had been thoroughly reviewed, indicating their physical well-being: no chronic illnesses or infectious diseases, with the standard range for blood pressure, heart rate, standard levels for blood chemistry, including cholesterol, glucose, liver function, kidney function, and absence of any medical conditions that significantly affect bodily functions or absence of diagnosed mental disorders.Notably, all three volunteers belonged to the southern Han nationality, representing approximately half of the Chinese national population and around 10% of the global population.The cell culture conditions were referred to in the previous publications 33,34 .establishment of hiPSCs from different sources.HiPSCs derived from SFs and UCs were generated based on the method described in the previous study 33 .The reprogramming procedure of UCs can be found in our previous protocol 34 .Reprogramming of human PBMCs was conducted with minor modification based on a published study 35 .Precisely, co-transfection of two episomal plasmids (pEP4-EO2SET2K and pEP4-M2L) and a vector containing hmiR302 cluster was performed in human PBMCs using Amaxa TM Basic Nucleofector TM Kit (Lonza), then these PBMCs were seeded on 6-well cell culture plate.
Characterisation of the hiPSCs.The karyotypes of hiPSCs derived from three types of somatic cell sources were detected using G band techniques.The presence of inserted genes from the reprogramming plasmids were demonstrated by PCR and gel-imaging system, as described in our previous research 33 .The protocols of immunofluorescence and quantitative real-time PCR analysis were referred to our previous studies 33,34 .Approximately 1 × 10 6 iPSCs were suspended in 100 μl Matrigel (diluted by DMEM/F12 1:1) and subcutaneously injected into the back of NOD/SCID mice.After teratoma formation, tumours were stained with haematoxylin-eosin and observed using an Olympus IX73 microscope.
RNA extraction, library construction, and illumina sequencing.Total RNA was isolated using TRIzol reagent (Thermo Fisher) following its standard protocols.RNA qualification, library construction and sequencing were performed as our previous publications 33,34 .
exosomal miRNA extraction, library construction, and illumina sequencing.Exosomes were isolated from the cell culture medium using exoRNeasy Serum/Plasma Maxi Kit (Qiagen) following the provided instructions.HiPSCs with passage numbers ranging from 21 to 28 were used to isolate the exosomes in our experiments.Exosomal miRNA was extracted utilizing the exoRNeasy Mini Kit (Qiagen).After quantifying and qualifying the RNA, 3 μg of total RNA was used for the construction of sequencing libraries through NEBNext Multiplex Small RNA Library Prep Set for Illumina (NEB, USA).The quality of each library was assessed by the Agilent Bioanalyzer 2100 system, and sequencing was done on an Illumina Hiseq 2500/2000 platform.
Preprocessing of RNA sequencing data.Raw sequencing data were processed by fastp v0.20.1 36 with default parameters to remove adapter sequences, low-quality reads and short-length sequences.The resulting clean reads were then mapped to the human genome hg38 to quantify global gene expression using the   and the correlation coefficient calculation.Differentially expressed genes (DEGs) between somatic cells and their derived hiPSCs were determined by DESeq 2 v1.30.1 37 .DEGs were identified with an adjusted P-value < 0.05 and Table 3.The number of differently expressed genes (DEGs) between hiPSCs and somatic cells.BiPS: PBMCsderived hiPSCs, FiPS:SFs-derived hiPSCs, UiPS: UCs-derived hiPSCs.The genes with |log2FoldChange (FC)| ≥ 1, and an adjusted P-value < 0.05 were determined as DEGs.
the absolute value of log 2 FoldChange > 1.While TPM matrix information was used to evaluate the expression levels of differentiation and pluripotency related genes.Gene set enrichment analysis was conducted by clus-terProfiler v4.4.4 38 based on Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) databases.
Preprocessing of small RNA sequencing data.FastQC (http://www.bioinformatics.babraham. ac.uk/projects/fastqc/) was utilized to verify the sequence quality by assessing parameters such as Q20, Q30, GC-content and adapter sequences.The raw data were mapped to the human genome hg38 and human miRNA sequences from miRBase 39 v22 to predict novel miRNAs and evaluate the expression levels of known miRNAs by miRDeep 40 v2.0.1.2 in multiple samples mode.The mapping number of each read in the human genome hg38 was constrained by a maximum of 5. Low-expression miRNAs with an average expression of less than 1 were filtered out, and then log 2 (CPM + 1) values of miRNA in each sample were used to perform the principal component analysis (PCA) and calculate the correlation coefficient.Differentially expressed miRNAs (DEMs) were identified Table 5. Summary of small RNA-seq reads mapping results.
using the same method as for DEGs.The miRNA targeted genes were analysed by multiMiR 41 v1.12.0 based on its database v2.3, which incorporated three validated miRNA-target interactions databases (miRecoord, miRtarBase and TarBase) and 8 predicted miRNA-target interactions databases (DIANA-microT-CDS, RIMMo, MicroCosm, miRDB, PicTar, PITA and targetScan).The down-regulated DEGs were used to explore the up-regulated DEMs target genes, while the up-regulated DEGs were used to explore the down-regulated DEMs target genes.The targeted genes found in validated databases or in more than three predicted databases underwent GO and KEGG pathway enrichment, as described above.

Data Records
Our raw data, consisting of 17 RNA-seq and 18 exosomes' small RNA-seq data sets, was stored in the Genome Sequence Archive 42 in National Genomics Data Center (NGDC) 43 with the accession number HRA003697 44 .The corresponding expression matrix information was deposited in NGDC of China National Center for Bioinformation with the accession number PRJCA013662 45 .
technical Validation the characteristics of hiPSCs.Karyotype analysis indicated that the chromosome profiles of hiP-SCs derived from PBMCs, SFs and UCs were without abnormality (Fig. 2a).The exogenous episomal DNA (OCT4, SOX2, KLF4, MicroRNA302-367 or c-Myc) was absent in all three somatic cell-derived hiPSCs (Fig. 2b).Compared with their respective somatic cells, the mRNA expression levels of OCT4, SOX2 and NANOG were significantly increased in hiPSCs from three categories (Fig. 2c).Immunofluorescence result revealed higher expression levels of the pluripotent protein markers (OCT4, SSEA4, TRA-1-60, and TRA-1-81) in hiPSCs derived from the three somatic cell types compared to their respective somatic cells (Fig. 2d).The pluripotent potential of hiPSCs derived from the three types of somatic cells was further investigated by teratoma formation, which exhibited their ability to differentiate into three germ-layer (Fig. 2e).These results illustrated that our hiPSCs have similar pluripotent characteristics to human embryonic stem cells.ranging from 66% to 88% and the unique mapping rate ranging from 15% to 22%, much lower than that of the multiple mapping rate (Table 2).

Genes expression analysis.
Gene expression analysis was performed using three biologically replicated samples, except that SFs-derived hiPSCs had only two samples.The correlation coefficients calculated based on Pearson correlation and PCA analysis implied good repeatability within the biological replicates and high similarity among hiPSCs derived from the three kinds of somatic cells (Fig. 3a,b).The gene expression levels of the pluripotency related genes (PPA2, GDF3, TERT, NANOG, ZFP42, FGF4, LIN28A, DPPA4, POU5F1, and SOX2) were significantly higher in hiPSCs than that in somatic cells, whereas the gene expression levels of differentiation related genes (NR2F2, ANPEP, and SOX17) were significantly lower (Fig. 3c).The analysis of differentially expressed genes (DEGs) between somatic cells and their derived hiPSCs indicated significant changes from a differentiated state to a pluripotency state.In these sets, a total of 10,203 genes, 10,026 genes and 9,330 genes were identified as DEGs (Table 3), respectively, including 4446 shared DEGs with 2279 consistently up-regulated DEGs and 1423 consistently down-regulated DEGs.(Fig. 3d).In addition, the top 20 pathways identified in the enrichment analysis of DEGs based on the biological process GO and KEGG indicated that immune-related pathways were mostly down-regulated in PBMCs-derived hiPSCs compared to PBMCs.Up-regulated synaptic signalling and down-regulated embryonic skeletal system development were both observed in the SFs and UCs categories (Fig. 3e,f).
Processing the small RNA sequencing data.A total of 2.83 × 10 7 raw reads were generated, ranging from 1.1 to 2.2 × 10 6 raw reads per sample.The Q20 rate was more than 0.99, the Q30 rate was more than 0.98, and the GC content of raw reads with a mean length of 50 bp was approximately 0.53 (Table 4).In summary, the small RNA-seq sequencing data have quite good quality.The mapping rate of small RNA reads to the hg38 genome per sample ranged from 19% to 62%, while the mapping rate of reads to human miRNA sequences ranged from 0.5% to 12%, with an average rate of 2.50% (Table 5).According to data GSE216556 46 deposited on the NCBI GEO database, the mapping rates of small RNA sequences extracted from exosomes to human miRNA sequences ranged from 1.26% to 2.68%, indicating a lower alignment rate of miRNA from exosomes compared to the cellular datasets.The results of correlation coefficients and the PCA analysis implied good repeatability within the biological replicates, except for sample Us3, which displays a closer relationship to the PBMCs exosomes (Fig. 4a,b).The exosomal miRNA data generated from three categories of hiPSCs exhibited a high degree of similarity, vastly different from their somatic miRNA samples.A total of 248, 137 and 106 miRNAs were separately identified as differentially expressed miRNAs (DEMs) in these three groups, including shared 72 DEMs with 68 consistently up-regulated DEMs and 5 consistently down-regulated DEMs (Table 6, Figs.3d, 4c).The items of DEGs targeted by DEMs were summarised in Table 7.The gene set enrichment analysis of targeted DEGs for each group based on biological process GO, and KEGG was conducted and revealed similar enriched pathways to DEGs (Table 8).The top 20 enriched pathways were shown in Fig. 4e,f.

Fig. 1
Fig. 1 Schematic workflow of this investigation.(a) Exosome miRNA and total mRNA were collected from three categories of somatic cell and their derived iPSCs, along with identifying hiPSCs characteristics.(b) An overview of the analysis flow of miRNA and mRNA data.

Fig. 2
Fig. 2 Characteristics of hiPSCs.(a) The karyotype of hiPSCs derived from three somatic cell types with three samples.(b) The presence of exogenous episomal DNA in hiPSC was identified by agarose gel electrophoresis.Somatic cells transfected by episomal DNA served as the positive control, while the H1 cell line and somatic cells served as the negative controls; GAPDH served as the internal reference.B: PBMCs, F: SFs, U: UCs, i: hiPSCs.The specimens Fi1, Bi2 and Ui3 were chosen to show the results.(c) The expression levels of NANOG OCT4, and SOX2 in somatic cells, their derived hiPSCs and H1 cells were evaluated by qRT-PCR.The gene expression level in each sample was detected triple times, and its mean expression was used as its expression value.Each point represents a sample value.The P-value was calculated by Student's t-test.*P < 0.05, **P < 0.01, ***P < 0.001.(d) The detection of OCT4, SOX2, SSEA4, TRA-1-60 and TRA-1-81 by immunostaining.scale bar: 200 um.(e) The histology of teratomas induced from hiPSCs derived from different somatic cells.The hiPSCs generated from sample 2 were chosen to show as an example.Teratomas were stained with H&E.

Fig. 3
Fig. 3 Gene expression among somatic cells and hiPSCs.(a) Principal components analysis of different somatic cells and their derived hiPSCs.(b) Correltion analysis of different somatic cells and their derived hiPSCs.(c) The mRNA expression levels of the pluripotency and differentiation related genes.(d) Venn plot of DEGs among the three groups.Bips_B: PBMCs-derived hiPSCs Vs.PBMCs, Fips_F: SFs-derived hiPSCs Vs.SFs, Uips_U: UCs-derived hiPSCs Vs.UCs.(e,f) Gene set enrichment analysis of DEGs based on biological process GO and KEGG databases, respectively.The top 20 enriched pathways of each group were displayed.

Fig. 4
Fig. 4 Expression of miRNA in exosomes from somatic cells and hiPSCs.(a) Principal components analysis applied to different somatic cells and their derived hiPSCs.(b) Correlation analysis of different somatic cells and their derived hiPSCs.(c) Venn plot of differentially expressed miRNA (DEMs) among the three groups, Bips_B: PBMCs-derived hiPSCs Vs.PBMCs, Fips_F: SFs-derived hiPSCs Vs.SFs, Uips_U: UCs-derived hiPSCs Vs.UCs.(d) The expression profile of the shared DEMs (e,f) Gene set enrichment analysis of DEGs targeted by DEMs based on GO and KEGG databases, respectively.The top 20 enriched pathways of each group were shown.

Table 4 .
Quality summary of small RNA-seq data.

Table 7 .
The number of DEGs targeted by DEMs.B: PBMCs-derived hiPSCs vs. PBMCs, F: SFs-derived hiPSCs vs. SFs, U: UCs-derived hiPSCs vs. UCs.dnDEM2upDEG illustrated up-regulated DEGs are used to search the targeted genes of down-regulated DEMs, upDEM2dnDEG illustrated down-regulated DEGs are used to search the targeted genes of up-regulated DEMs.

Table 8 .
The number of enriched pathways of DEGs and their targeting DEGs based on biological process GO and KEGG.DEMs2DEGs: DEGs targeted by DEMs.Bips_B: PBMC-derived hiPSCs vs. PBMCs, Fips_F: SFderived hiPSCs vs. SFs, Uips_U: UC-derived hiPSCs vs. UCs.High-throughput RNA sequencing generated 1.2~2.0× 10 7 raw reads per sample, with the Q20 > 0.95, Q30 > 0.90, GC-content close to 0.50, and the mean length of 149 bp for clean reads(Table1).Most of the pair reads were aligned to the hg38 genome, with the sample align rate Quality control of RNA sequencing data.