RNA-seq profiling of white and brown adipocyte differentiation treated with epigallocatechin gallate

Due to serious adverse effects, many of the approved anti-obesity medicines have been withdrawn, and the selection of safer natural ingredients is of great interest. Epigallocatechin gallate (EGCG) is one of the major green tea catechins, and has been demonstrated to possess an anti-obesity function by regulating both white and brown adipose tissue activity. However, there are currently no publicly available studies describing the effects of EGCG on the two distinct adipose tissue transcriptomes. The stromal vascular fraction (SVF) cell derived from adipose tissue is a classic cell model for studying adipogenesis and fat accumulation. In the current study, primary WAT and BAT SVF cells were isolated and induced to adipogenic differentiation in the presence or absence of EGCG. RNA-seq was used to determine genes regulated by EGCG and identify the key differences between the two functionally distinct adipose tissues. Taken together, we provide detailed stage- and tissue-specific gene expression profiles affected by EGCG. These data will be valuable for obesity-related clinical/basic research.

thermogenesis and mitochondrial biogenesis in brown adipose tissues 12 . However, a systemic comparison of transcriptomes that are affected by EGCG in both WAT and BAT is lacking, which is of great interest to the field.
To address this question, primary WAT and BAT preadipocytes were isolated and induced adipogenic differentiation in the presence or absence of EGCG. Samples were collected on Day 4 and Day 8 post differentiation, which represented the early and mature stages of adipogenesis, respectively. Then, we performed Illumina RNA-seq and bioinformatics analysis of the mRNA profile (Fig. 1). This comprehensive dataset will provide the stage-and tissue-specific effects of EGCG on adipogenesis and fat storage. The dataset might be a very useful resource to researchers for the development of therapies to treat obesity and other metabolic diseases.

Methods
Cell culture. All procedures involving mice were approved by the Xinyang Normal University Animal Care and Use Committee. Primary WAT and BAT stromal vascular fraction (SVF) cells were isolated as previously described 13,14 . Briefly, inguinal WAT and interscapular BAT were obtained from 6-week-old mice on a C57BL/6 J background. Fat pads were cut into small pieces and incubated with collagenase digestion solution (1.5 mg ml −1 , #SCR103, Sigma-Aldrich). WAT was incubated for 30 min and BAT was incubated for 50 min. Then, an equal volume of growth medium (DMEM containing 20% FBS) was added to terminate digestion. To remove tissue debris, the digestion was filtered through 100-μm and 70-μm cell trainers. Next, it was centrifuged at 450 g for 8 min. Then, the SVF cells pellet was resuspended and cultured in growth medium, at 37 °C with 5% CO 2 . When the cells reached 90% confluence, the growth medium was changed to DMEM containing 10% FBS, and the cells  www.nature.com/scientificdata www.nature.com/scientificdata/ were induced to adipogenic differentiation by supplementation with a cocktail containing 2.85 mM recombinant human insulin (#I8830, Solarbio), 0.3 mM dexamethasone (#D8040, Solarbio), and 0.63 mM 3-isobutyl-methylxanthine (#I7018, Sigma-Aldrich). After four days, the cocktail was changed to 200 nM insulin, and 10 nM T3 (#T6397, Sigma-Aldrich) to induce mature adipocytes. EGCG (5 μM) or DMSO control (1:1000) was added to the induction medium and differentiation medium from Day 4 to 8 of adipogenesis. The medium was changed every 2 days. To examine the lipid droplets, oil red O staining was performed. Briefly, the mature adipocytes were washed twice with PBS. Then, the cells were fixed with 10% formaldehyde for 5 min. Next, the cells were stained with Oil red O staining solutions (#G1262, Solarbio) for 30 min. After staining, the images were captured with an Axio Observer 3 Zeiss microscope (Carl Zeiss, Germany). RNA reparation. TRIzol (#15596026, Thermo Fisher Scientific) was used to collect RNA. RNA concentration was measured using a Qubit ® RNA Assay Kit in a Qubit ® 2.0 Flurometer (Life Technologies, USA). A Bioanalyzer 2100 system (Agilent Technologies, USA) was used to examine RNA integrity.

Real-time PCR analysis.
Real-time PCR was performed as previously described 15 . Briefly, Reverse Transcription Kit (#RR037A, Takara) was used to synthesize cDNA. Real-time PCR was performed on a CFX96 Real-Time System (Bio-Rad Laboratories, Singapore) according to the manufacture's instructions. 18 S was used as housekeeping genes. The (2 −ΔΔCt ) method was used to calculate the gene expression levels. Bioinformatics analyses. Quality control of sequences was examined with FastQC (version 0.11.9) 16 . Clean data were obtained using Fastp (version 0.20.1) 17 to remove low-quality reads, read adapters, and reads containing poly-N. Then, the clean data were subjected to mapping with the Mus musculus mm9 genome reference by HISAT2 (version 2.2.0) 18 . The average input read count was 44.91 million per sample (range 20.12 million to 63.9 million), and the average percentage of uniquely aligned reads was 90.70% (range 87.57% to 92.5%). The read counts for each gene were calculated by HTSeq (version 0.11) 19 . The third biological replication was sequenced at different baches as the other biological replications. The expression levels were corrected for batch effects using the Combat_Seq function from the SVA R package (version 3.42.0) 20 . Principal component analysis (PCA) was generated with FactoMineR 21 to assess variance between sample groups and sample replicates. Differential gene www.nature.com/scientificdata www.nature.com/scientificdata/ expression was identified by DESeq2 (version 1.10.1). A heatmap was generated with pheatmap (version 1.0.12) 22 to show the differentially expressed genes between the control and EGCG-treated group. The R package ggplot2 (version 3.3.4) 23 was used to generate volcano plots and compare the gene expression levels between the control and EGCG-treated groups.

Data Records
The RNA-Seq data were deposited in the NCBI Sequence Read Archive (SRA) under the accession number SRP318155 24 . The metadata records regarding the sample's information and RNA-seq read statistics are provided in Tables 1 and 2, respectively.

Technical Validation
Quality control of cell differentiation. The quality of WAT and BAT SVF adipogenic differentiation was examined at Day 4 and Day 8 post differentiation. As shown in Fig. 2a, the cells underwent a morphology change, from spindle to round and small droplets began to accumulate at Day 4 in the control groups. Furthermore, many more large droplets accumulated at Day 8 post differentiation. Oil red O staining showed that both WAT and BAT SVF cells differentiated well at Day 8. Compared to the control groups, the SVF cells treated with EGCG showed fewer lipid droplets, which indicated that EGCG inhibits SVF cells adipogenic differentiation. Consistent with these findings, real-time PCR analysis showed that adipose marker genes (Pparγ , Fabp4, Adiponectin, and Plin1) were expressed at much higher levels in the control groups than in the EGCG treated groups. In addition, the BAT SVF cells in the control groups expressed approximately 30 times more Ucp1 (BAT marker gene) than the WAT www.nature.com/scientificdata www.nature.com/scientificdata/ SVF (Fig. 2b). The above results suggested that SVF cells in the control groups differentiate into the right subtype of adipocytes and that EGCG inhibits SVF cell adipogenic differentiation.
Quality control of RNA integrity. The quality of total RNA was examined by an Agilent Bioanalyzer 2100.
All of the samples showed high RNA integrity (RIN value ranging from 7.7 to 10) and could be used for downstream sequencing.
RNA-Seq data quality. The raw RNA-seq data quality was determined by FastQC. A representative FastQC report is depicted in Fig. 3. As indicated, the reads had universally high-quality values (Fig. 3a,b). The distribution of GC content was similar to the theoretical distribution, which indicates that the samples were free from contamination (Fig. 3c). Meanwhile, the sequence length distribution showed a peak only at 150 bp, which corresponded to the fragment sizes of the RNA-seq libraries (Fig. 3d). Then, the geneBodyCoverage.py script from the RseQC package was used to assess the quality of the reads and no significant 5′ or 3′ end bias was identified (Fig. 3e). In addition, all other fastqc files showed similar reports and were qualified for downstream analysis. Subsequently, a very high percentage (more than 87.57%) of reads were mapped to the reference genome mm9 (Table 2). Next, the genes expression levels were examined. The box plot showed that the global expression levels of mRNA in all samples were similar (Fig. 4a). Furthermore, analysis of all the expressed genes by PCA plot showed that samples derived from a different tissue, time points, and treatment fell into distinct groups, suggesting variability between groups. Meanwhile, the biological replicates settled near each other, indicating high repeatability (Fig. 4b).
Identification of genes affected by EGCG. To identify the genes affected by EGCG, gene expression levels of the control and EGCG-treated groups were analyzed by DESeq2. The threshold criteria were set as p-adj < 0.01 and absolute (log 2 FoldChange) > 1. The genes that met these criteria were further analyzed with a heatmap and volcano map. Figure 4c shows that the samples from the control groups and the EGCG-treated groups fell into two separate groups, indicating that EGCG robustly affects gene expression during adipogenesis. In addition, samples from triplicate replication were densely clustered, indicating high repeatability. Figure 4d shows that the EGCG regulated the expression of many genes, and the top significantly differentially expressed genes ordered by p-adj were Cep170b, Pdpn, Soc1, Slc39a6, and Pidd1.

Usage Notes
RNA-seq has been widely used to study gene expression in recent decades. It is a powerful method to systematically determine the molecular pathways affected by bioactive substances. Recently, EGCG has been recognized as one of the natural products with high anti-obesity efficacy. The data could provide important findings of the genes regulated by EGCG in adipogenesis and provide information about the role of EGCG in lipid accumulation and fat metabolism.
One major advantage of the data is that detailed stage-and tissue-specific gene expression profiles affected by EGCG were investigated. The data may be valuable to obesity-related clinical/basic research. Of note, RNA from the three replications was collected simultaneously, but the third replication was not sequenced at the same time as the other two. Although the quality of all RNA met the requirements of RNA-seq, it would be beneficial to correct the batch effect.

Code availability
In the current study, the following open access software was used as described in the Methods section. For all the software, we used default parameters, and no custom code was used beyond the tools listed.