Transcriptome dataset of omental and subcutaneous adipose tissues from gestational diabetes patients

Gestational diabetes (GD) is one of the most prevalent metabolic diseases in pregnant women worldwide. GD is a risk factor for adverse pregnancy outcomes, including macrosomia and preeclampsia. Given the multifactorial etiology and the complexity of its pathogenesis, GD requires advanced omics analyses to expand our understanding of the disease. Next generation RNA sequencing (RNA-seq) was used to evaluate the transcriptomic profile of subcutaneous and omental adipose tissues (AT) collected from patients with gestational diabetes and matched controls. Samples were harvested during cesarean delivery. Results show differences based on anatomical location and provide whole-transcriptome data for further exploration of gene expression patterns unique to GD patients. Measurement(s) RNA sequencing Technology Type(s) Paired-End Sequencing Factor Type(s) Group • Adipose Tissue Site Sample Characteristic - Organism Homo sapiens Sample Characteristic - Environment city Sample Characteristic - Location Bogota, Colombia Measurement(s) RNA sequencing Technology Type(s) Paired-End Sequencing Factor Type(s) Group • Adipose Tissue Site Sample Characteristic - Organism Homo sapiens Sample Characteristic - Environment city Sample Characteristic - Location Bogota, Colombia


Patient ID Age Group
Pregestational BMI Insulin (µU/mL) 1 Glucose (mg/dL) 1 Table 3. Total reads and clean reads obtained from next-generation sequencing of omental and subcutaneous adipose tissue depots collected during C-Section from gestational diabetes patients and controls.
www.nature.com/scientificdata www.nature.com/scientificdata/ pregnancy, hypertension, hypo or hyperthyroidism, autoimmune diseases, chronic diseases, and active tuberculosis were excluded. Table 1 presents a descriptive summary of demographics and blood biomarkers of the GD patients and controls.
AT samples from the SC and OM depots were collected during the C-Section. In brief, the SC samples were harvested from the incision area using a surgical scalpel. OM samples were collected from the surgical area using scissors and ligature at the omentum majus level. Both AT samples were flash-frozen and stored in liquid nitrogen until processing. Then, total RNA was extracted from OM and SC samples using Trizol and the Quick RNA MiniPrep kit (R1054; Zymo Research, Irving, CA, USA) that includes a DNase step to remove genomic DNA according to the manufacturer's protocol.

Data Records
Raw FASTQ data is available in the NCBI Gene Expression Omnibus (GEO) NCBI GSE188799 12 . Raw read count matrix was also deposited in the NCBI Gene Expression Omnibus (GEO) under accession number GSE188799 12 . Processed read count matrix and DEGs found in patients with gestational diabetes are available in (Supplemental Table 1 13 ).

Fig. 1
Evaluation of sequence quality scores in raw FASTQ data. The quality of FASTQ files was estimated using FastQC and summary plots for different samples were mapped on MultiQC. All 40 FASTQ files were assessed, and plots for GC content, mean quality per-base and per-sequence quality in terms of Phred score are presented.

In (A) Results for Omental samples and (B) results for Subcutaneous samples.
www.nature.com/scientificdata www.nature.com/scientificdata/

technical Validation
Purity, concentration, and integrity of mRNA were checked using a NanoDrop 1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA) and an Agilent Bioanalyzer 2100 system (Agilent Technologies,  www.nature.com/scientificdata www.nature.com/scientificdata/ Santa Clara, CA, USA). All samples had a 260:280 nm ratio between 1.9 and 2.1 and RNA integrity number ≥ 7 (Table 2). At least 1 µg of each sample was used for NGS.
RNa sequencing. All RNA-seq was performed at the Beijing Genomics Institute [BGI, Shenzhen/Hong Kong, China (www.genomics.cn)] and paired-end sequencing (100 bp) was performed on the DNBSEQ platform. BGI's process includes filtration and exclusion of reads with excessively high levels of unknown base N, adaptor contamination and low-quality reads with a score below 15. On average, 4.5 million adapter sequences were filtered, and the average size of clean reads was 4.46 Gb per sample (range 4.43-4.48 Gb). The ratio of clean reads was 93,7% (Table 3). RNA raw sequencing data was obtained in fastq-files from BGI and subsequent data processing and quality control was performed with FastQC v0.11.8 14 (www.bioinformatics.babraham.ac.uk/projects/fastqc/) by the authors.
Quality assessment of total RNa and RNa-Seq data. Data quality of the raw RNA-seq reads from FastQC was compiled using MultiQC 15 . Basic quality assessments included: Phred scores, per sequence and per base quality score, GC contents, overrepresented k-mers, duplicated reads and presence of adaptors were re-checked. To identify global tendencies in the quality metrics output from MultiQC shows the quality across SC and OM samples (Fig. 1).
Reads mapping and counts. After quality check, reads were mapped to the Homo sapiens reference genome (GRCh37/hg19) using HISAT 2.1.0 16 . BAM files obtained were sorted using SAMtools 17 in the High Performance Computing at the Institute for Cyber-Enabled Research (ICER), Michigan State University. Mapping results are summarized in (Fig. 2A). The average mapping ratio with the reference genome was 91.8%. Next, fea-tureCounts v.2.0.1 18 was used to summarize the number of raw reads (Fig. 2B). On average 35,9 millions of reads (73,8%) were assigned to coding genes. www.nature.com/scientificdata www.nature.com/scientificdata/ Differential expression analysis in tissue-specific profiles. For differential expression analysis purposes, data counts were normalized through DESEQ. 2.0 negative binomial distribution model 19 . Sample variance was established using principal component analysis (PCA) plotting and hierarchical clustering (complete linkage