Adipose tissue in health and disease through the lens of its building blocks

Understanding adipose tissue cellular heterogeneity and homeostasis is essential to comprehend the cell type dynamics in metabolic diseases. Cellular subpopulations in the adipose tissue have been related to disease development, but efforts towards characterizing the adipose tissue cell type composition are limited. Here, we identify the cell type composition of the adipose tissue by using gene expression deconvolution of large amounts of publicly available transcriptomics level data. The proposed approach allows to present a comprehensive study of adipose tissue cell type composition, determining the relative amounts of 21 different cell types in 1282 adipose tissue samples detailing differences across four adipose tissue depots, between genders, across ranges of BMI and in different stages of type-2 diabetes. We compare our results to previous marker-based studies by conducting a literature review of adipose tissue cell type composition and propose candidate cellular markers to distinguish different cell types within the adipose tissue. This analysis reveals gender-specific differences in CD4+ and CD8+ T cell subsets; identifies adipose tissue as rich source of multipotent stem/stromal cells; and highlights a strongly increased immune cell content in epicardial and pericardial adipose tissue compared to subcutaneous and omental depots. Overall, this systematic analysis provides comprehensive insights into adipose tissue cell-type heterogeneity in health and disease.


Generation of signature matrices
For the generation of AT21, we collected single cell type gene expression data from 21 different cell types (204 samples in total) from publicly available datasets in the Gene Expression Omnibus (GEO) database [S1] (Figure 2A). For AT4, a single dataset with four different cell fractions was utilized. For each signature matrix, raw data (CEL files) of the determined reference dataset were downloaded and preprocessed with Affymetrix Power Tools (https://www.thermofisher.com/nl/en/home/lifescience/microarray-analysis/microarray-analysis-partners-programs/affymetrix-developersnetwork/affymetrix-power-tools.html#) using the robust multi-array average (RMA) normalization method.
The normalized reference dataset was then used to generate the AT21 or AT4 signature matrix using CIBERSORT [S2] (https://cibersort.stanford.edu). For each cell type, CIBERSORT first filters probes based on their differential expression between the selected cell type and all other samples (q value <0.3 (false discovery rate), two-sided unequal variance t-test). Subsequently, probes are ranked according to their fold change between the respective cell type and all other samples and the top G probes are included in the signature matrix. Here, G (between 50 and 150) is selected to minimize the condition number of the signature matrix [S2]. This resulted in a total of 1872 probes in AT21 (Supplementary Data S1) and 375 probes in AT4.
CIBERSORT provides a deconvolution p-value calculated from 1000 bootstrapped permutations [S2], as well as a correlation value, and root-mean squared error (RMSE) per sample. All analyzed samples had a p-value < 0.0001 and RMSE < 1. We also observed high correlation values with a median of 0.742 (IQR: 0.7238 -0.7582), indicating that the linear combination of cell types contained in the reference dataset could very well reproduce the tissue expression values. We noted that specifically one dataset (GSE26637) consisting of 20 SAT samples had lower correlation values, which coincides with differences in the cell type estimates e.g. for osteoblasts (Supplementary Figure S5) compared to other data sets. Removal of this dataset did not change the overall results (data not shown). The dataset was not considered for the more in-depth analysis relating tissue composition to phenotypic traits.

Independent (ex-vivo) validation of the TissueDecoder Framework
For generation of the "validation dataset", two datasets (GSE73174 and GSE80654, Affymetrix Human Transcriptome Array 2.0) were downloaded (CEL files) and preprocessed together as described above.
Subsequently, we performed probe matching via the biomaRt R package for platform transformation and quantile normalized the validation dataset with the reference and analysis datasets form the Affymetrix Human U133 Plus 2.0 microarray. The generated validation dataset contains expression data from CD4 + T cells, CD8 + T cells, CD14 + Monocytes, CD19 + B cells and CD56 + Natural Killer Cells that were isolated from blood, as well as from adipocytes, progenitors/adipose stem cells (CD45 -CD34 + CD31 -), and monocytes/macrophages (CD45 + CD14 + ) that were isolated from adipose tissue.
The TissueDecoder framework is being used to calculate the percentages of the 21 cell types from the AT21 signature matrix in the validation dataset and to evaluate the expression of conventional markers as well as the primary markers reported from CellMaDe. The results are shown in Supplementary Figure S4 and Supplementary Data S2.

Application to RNASeq data
For testing the applicability of AT21 to deconvolve adipose tissue samples profiled via RNA-seq, we downloaded the preprocessed data ((effective) counts or RPKM/FPKM values) from 5 original datasets (GEO series numbers: GSE107894, GSE57803, GSE65540, GSE66446, GSE95640), containing a total of 503 adipose tissue samples. All expression values were converted to the unit transcripts per million (TPM) according to the following formulas: Furthermore, we mapped all gene identifiers as well as microarray probe names of AT21 to HGNC (HUGO Gene Nomenclature Committee) ids using the biomaRt R package.
As a next step, we applied the AT21-CIBERSORT deconvolution to these RNAseq samples and compared the resulting cellular fractions to those obtained from microarray data of adipose tissue samples.

Evaluation of primary markers via Anatomically-annotated Tissue Expression Profiles
The definition of primary and secondary criteria defined in CellMaDe, depends on the cell types included in the analysis arguably and therefore, can be considered (adipose) tissue-specific, provided that all relevant cell types from the given tissue are included. In order to evaluate the validity of the most promising primary marker identified per cell type, as well as its general applicability across different tissues, we used Genevestigator [S4] to compare the expression of this marker to a large compendium of different cell and tissue types (394 anatomically annotated tissue expression profiles).
Genevestigator is a tool, that allows access to a normalized and curated database of publicly available transciptomics profiles and permits reproducible data analysis. It is freely available for analyzing the anatomical tissue expression for candidate genes.
The result of the most promising markers per cell type is shown in Supplementary Figure S3  where z A is the number of adipocytes per g of adipose tissue. Subsequently, we used the formula described above to convert x into percent of total cells (y). We estimated z A as * π * 50 3 * 0.9196 * 10 12 =1,972,995 (equation 5) Here, m A =0.95 g is the assumed weight of adipocytes per gram of adipose tissue, V A is the average volume of adipocytes assuming that adipocytes are spherical with a radius of 50μm (diameter of 100μm) [S6], and 0.9196 kg L is the density of fat.
Finally, there were ten studies that used flow cytometry of the SVF to determine cell numbers as a percent of SVF. We converted this number to a percent of total cells through division by three, assuming that adipose tissue consists of roughly 1/3 SVF cells, as described above.