Single-cell transcriptional networks in differentiating preadipocytes suggest drivers associated with tissue heterogeneity

White adipose tissue plays an important role in physiological homeostasis and metabolic disease. Different fat depots have distinct metabolic and inflammatory profiles and are differentially associated with disease risk. It is unclear whether these differences are intrinsic to the pre-differentiated stage. Using single-cell RNA sequencing, a unique network methodology and a data integration technique, we predict metabolic phenotypes in differentiating cells. Single-cell RNA-seq profiles of human preadipocytes during adipogenesis in vitro identifies at least two distinct classes of subcutaneous white adipocytes. These differences in gene expression are separate from the process of browning and beiging. Using a systems biology approach, we identify a new network of zinc-finger proteins that are expressed in one class of preadipocytes and is potentially involved in regulating adipogenesis. Our findings gain a deeper understanding of both the heterogeneity of white adipocytes and their link to normal metabolism and disease.

Boxplots are centered on the median, the interquartile range (IQR) spans the 25-75% percentile, and the whiskers extend to 1.5 times the IQR above the 75% percentile (maximum) and below the 25% percentile (minimum).

A Network Decomposition Algorithm for Single-cell RNA Sequencing
In this section, we describe an algorithm for detecting active connected subnetworks in the PPI network such that the observed expression of each gene across all cells is regulated (and approximated) by the sum the activities of subnetworks in which the gene is a member.
The inputs to the algorithm are an expression matrix ∈ ℝ × and a PPI network ( , ). The rows in the expression matrix correspond to samples or cells and the columns correspond to genes. Each node in is associated with a column (gene) in .
The PPI network is obtained from Pathway Commons.
Moreover, the algorithm has 3 input parameters: the number of subnetworks to be found (denoted by ), the maximum size of any subnetwork (denoted by ), and an integer that specifies the number of seed nodes used in the algorithm as described below. The parameter r roughly corresponds to the more traditional number of gene clusters in gene expression data we are seeking to identify.
The output of the algorithm consists of detected subnetworks in the PPI network along with associated cell-specific activity levels for each subnetwork. For a given number of subnetworks and for ∈ {1, … , }, we denote the th subnetwork with a binary indicator vector of length . We denote the activity level of subnetwork over cells with a real vector of length . The algorithm generates an approximation for the input matrix that can be expressed as: ̃= ∑ =1 More compactly, we collect 1 , ⋯ , into a matrix such that = 1 iff network i includes gene j. Then we can write the problem that our algorithm is trying to solve as: where each row in ∈ {0,1} × denotes a connected subnetwork whose signature activity over cells forms the corresponding column of ∈ ℝ × . Supplementary Figure 10 shows an example of the network decomposition for = 2.
We develop a heuristic algorithm to solve the above problem. Our algorithm consists of two simpler procedures: 1) a greedy method that consecutively finds connected subnetworks, such that each solution best approximates the residual data matrix that is obtained after subtracting previous solutions.
2) a greedy method to find a single connected subnetwork and a signature vector for a given residual data matrix such that assigning the signature vector to all the genes in the detected subnetwork minimizes the reconstruction error of the residual matrix, i.e., min , || − || 2 .

Supplementary Figure 10 CG Decomposition Overview
The first procedure is called GreedyCG and the second procedure is called FindSubgraph in the description below. FindSubgraph is called as a subroutine in GreedyCG. In the following we describe each algorithm separately.
Supplementary Figure 11 shows GreedyCG algorithm in pseudocode. The number of seed nodes specifies how many times the algorithm calls the FindSubgraph procedure to find each subnetwork. We provide more details about seed nodes when describing FindSubgraph algorithm (Supplementary Figure 12).
The GreedyCG algorithm starts by setting the residual matrix to the original expression matrix. Then the algorithm consecutively solves rank-1 approximations of the data matrix and subtracts them from the residual matrix. Each rank-1 solution identifies a connected subnetwork and a signature expression vector. In each iteration, the algorithm constructs one row of which is the binary indicator vector of the detected subnetwork. The algorithm repeats this procedure times to find a rank-r factorization. Finally, it finds the optimal via least squares. Signature vectors of the detected subnetworks do not need to be saved while constructing rank-1 solutions because is unconstrained and so can be computed optimally and efficiently for a given by least-squares.
Each subnetwork detection iteration involves an inner loop. The inner loop finds candidate solutions by calling the FindSubgraph algorithm times, each time with a different seed node. For each seed it selects the solution that minimizes the approximation of the current residual data. This step is added since our subnetwork detection is a heuristic and depends on the initial seed. Increasing s will increase the chance of finding a rank-1 solution that better approximates the residual expression data.