Transcriptional synergy as an emergent property defining cell subpopulation identity enables population shift

Okawa, Satoshi; Saltó, Carmen; Ravichandran, Srikanth; Yang, Shanzheng; Toledo, Enrique M.; Arenas, Ernest; del Sol, Antonio

doi:10.1038/s41467-018-05016-8

Download PDF

Article
Open access
Published: 03 July 2018

Transcriptional synergy as an emergent property defining cell subpopulation identity enables population shift

Nature Communications volume 9, Article number: 2595 (2018) Cite this article

6000 Accesses
14 Citations
100 Altmetric
Metrics details

Subjects

Abstract

Single-cell RNA sequencing allows defining molecularly distinct cell subpopulations. However, the identification of specific sets of transcription factors (TFs) that define the identity of these subpopulations remains a challenge. Here we propose that subpopulation identity emerges from the synergistic activity of multiple TFs. Based on this concept, we develop a computational platform (TransSyn) for identifying synergistic transcriptional cores that determine cell subpopulation identities. TransSyn leverages single-cell RNA-seq data, and performs a dynamic search for an optimal synergistic transcriptional core using an information theoretic measure of synergy. A large-scale TransSyn analysis identifies transcriptional cores for 186 subpopulations, and predicts identity conversion TFs between 3786 pairs of cell subpopulations. Finally, TransSyn predictions enable experimental conversion of human hindbrain neuroepithelial cells into medial floor plate midbrain progenitors, capable of rapidly differentiating into dopaminergic neurons. Thus, TransSyn can facilitate designing strategies for conversion of cell subpopulation identities with potential applications in regenerative medicine.

Single-cell lineage capture across genomic modalities with CellTag-multi reveals fate-specific gene regulatory changes

Article Open access 25 September 2023

Kunal Jindal, Mohd Tayyab Adil, … Samantha A. Morris

Construction of a human cell landscape at single-cell level

Article 25 March 2020

Xiaoping Han, Ziming Zhou, … Guoji Guo

A comprehensive analysis of gene expression changes in a high replicate and open-source dataset of differentiating hiPSC-derived cardiomyocytes

Article Open access 04 August 2021

Tanya Grancharova, Kaytlyn A. Gerbin, … Ruwanthi N. Gunawardane

Introduction

Recent advances in single-cell RNA-seq technologies have allowed to classify cells into distinct cell subpopulations based on their gene expression profiles. The identity of these cell subpopulations can range from well-defined cell types, subtypes of a same cell type to cells with unclear characters. It has been observed that a handful of specific TFs is sufficient to maintain cell subpopulation identity¹. Identification of such core TFs can facilitate the characterization and conversion of any cell subpopulation, including rare and previously unknown ones, opening thus novel functional applications². However, this is a challenge since the core TFs that determine the identity of such novel cell subpopulations are largely unknown. Importantly, the definition of identity TFs is dependent on the cellular context in which it is employed³. In the context of cell/tissue types, for example between neurons and hepatocytes, the identity TFs are defined by the comparison between these largely different cell types. However, in the context of cell subpopulations within a cell type, such as different subtypes of dopaminergic neurons⁴, the definition of identity TFs becomes subtler due to the increased commonality between them.

Existing methods for identifying TFs for cell identity or cellular conversions^5,6,7 rely on a set of gene expression profiles of bulk cell/tissue types. Consequently, the application of these methods is limited to those bulk cell/tissue types, and cannot be applied to novel subpopulations of cells identified in a newly generated single-cell dataset. In addition, these methods detect potential identity TFs by focusing on properties of individual TFs, such as gene expression levels or the number of their unique target genes, rather than emergent properties of potential identity TFs themselves, such as transcriptional synergy among them.

Combinatorial binding of specific TFs to enhancers is known to result in a synergistic activity essential for robust and specific transcriptional programmes during development⁸. The functionality of several TFs operating together to achieve a common output has been studied in detail in embryonic stem cells (ESCs), where a transcriptional core involving Pou5f1, Sox2, and Nanog controls pluripotency⁹. Furthermore, it has been observed in different systems that multiple TFs are required to function cooperatively to sustain the overall cellular phenotype¹⁰.

Here, we propose the general concept that cell subpopulation identity is an emergent property arising from a synergistic activity of multiple TFs that stabilizes their gene expression levels. Based on this concept, we develop a computational platform, TransSyn, for the identification of synergistic transcriptional cores defining cell subpopulation identities. TransSyn does not depend on the inference of gene regulatory networks (GRNs), which are often incomplete and their topological characteristics not always capture the multiple direct and indirect interactions between genes. In addition, it only requires a single-cell RNA-seq data of distinct subpopulations as input (Fig. 1a), and does not depend on pre-compiled gene expression datasets or any other prior knowledge. Consequently, TransSyn infers subpopulation identities within a cell population, and aids in designing strategies to convert cell subpopulation identities, especially in cases of closely related subpopulations in functionally different states. Finally, as a direct application of TransSyn, we show that the knowledge of cell subpopulation-specific synergistic transcriptional cores enables experimental conversion of human hindbrain neuroepithelial cells into medial floor plate midbrain progenitors, which rapidly differentiate into DA neurons. Thus, TransSyn can facilitate conversion of cell subpopulation identities with potential applications in regenerative medicine.

Results

Rationale and outline of the method

TransSyn identifies a specific combination of TFs that are most frequently expressed and exhibit high transcriptional synergy computed by multivariate mutual information (MMI)¹¹. MMI measures the information (i.e., predictability) gained by an additional variable (TF), which cannot be explained by the simple summation of the information given by the subsets of variables. For example, MMI among three TFs, X, Y, and Z, is defined as:

$${\mathrm{MMI}}\left( {X;Y;Z} \right) = I\left( {X;Z} \right) + I\left( {Y;Z} \right) - I\left( {X,Y;Z} \right),$$

This indicates that when MMI is negative, the three TFs are synergistically interacting with each other, because the knowledge of both X and Y together (i.e., I(X, Y; Z)) provides more information about Z than the sum of the knowledges given by X and Y separately (i.e., I(X; Z)+I(Y; Z)) (Fig. 1b). The same principle applies to MMI with higher numbers of variables. In this way, TransSyn considers all possible direct and indirect regulatory interactions that can be measured by gene expression. Therefore, it can account for the disparate nature of synergistic transcriptional regulation, including combinatorial/cooperative binding of TFs to target gene promoter/enhancer regions⁸, and protein-protein interactions among transcriptional co-factors.

TransSyn requires single-cell RNA-seq data for MMI computation. Ideally, MMI for all possible combinations of TFs should be calculated to identify the most synergistic TF combination. However, such computation is infeasible (for example, the number of all combinations of 3, 4, 5, and 6 TFs among 100 TFs already adds up to 1, 271, 422, 845). Therefore, we implemented a dynamic search algorithm, in which an initial set of most synergistic 3-TF combinations (seed combinations) are progressively extended by adding TFs one by one as long as MMI calculated for the new combination exceeds the MMI of the previous seed combination (Fig. 1c; Supplementary Fig. 1) (see Methods). The search is terminated when the addition of a new TF results in no further decrease in MMI, and the current TF combination exhibiting the least MMI (i.e., most synergistic) is considered the synergistic transcriptional core. Upon termination, if more than one TF combination exhibits the highest synergy, they are ranked by another information theoretic measurement, total correlation (TC), which, unlike MMI, incorporates interactions between all possible combinations of TFs within each core providing a measure of interaction strength¹².

TransSyn captures known synergistic transcriptional cores

By applying TransSyn to a large compilation of published single-cell RNA-seq data, we created a catalog of synergistic transcriptional cores specific to 186 cell subpopulations (Supplementary Data 1). Here, by subpopulations we mean distinct groups of cells within a heterogeneous cell population identified based on their gene expression profiles, and do not discriminate between well-defined cell types, subtypes of a same cell type and cells with unclear identity. The predicted synergistic transcriptional cores, when evidence is available, consistently contained TFs known to maintain the respective cell subpopulation identities. For example, the key pluripotency factors POU5F1, NANOG, and SOX2 that maintain the ESC phenotype were found as the most synergistic transcriptional core in hESCs (Table 1; Supplementary Data 1). Notably, these TFs have been speculated to act synergistically via large clusters of enhancers¹³. Another example is the blood progenitor subpopulation¹⁴ that contained Tal1, Gata2, Runx1, and Fli1 in its synergistic transcriptional core (Table 1; Supplementary Data 1). These TFs have been shown to form complexes via protein-protein interactions that stabilize their co-operative binding to DNA and synergistically control the subpopulation identity¹⁵. Therefore, this represents another known example where a synergistic interaction of TFs defines a cell subpopulation identity. Finally, the synergistic core of human fetal oculomotor and trochlear nucleus (hOMTN) subpopulation consisted of ISL1 and PHOX2A (Supplementary Data 1), which have been shown to synergistically specify cranial motor neurons from mESCs¹⁶.

Table 1 Most synergistic transcriptional cores predicted by TransSyn and top 10 JSD TFs in example subpopulations, where known identity TFs are in bold

Full size table

TransSyn predictions also contained several TFs known to interact with each other to control cell subpopulation identities. For example, Gata1, Gfi1b Klf1, and Ikzf1, known to maintain embryonic blood cells^{17, 18} were found in the synergistic transcriptional core of the embryonic primitive erythrocyte subpopulation¹⁴ (Supplementary Data 1). Gata1 and Ikzf1 are known to functionally regulate each other. In addition, the synergistic transcriptional core of the embryonic visceral endoderm subpopulation¹⁴ included Eomes, Otx2, Zic3, Foxa2, and Hnf4a (Table 1; Supplementary Data 1), which are known to regulate each other and other downstream targets specific to this cell subpopulation^{19, 20}. Id3, Klf13, Klf6, and Klf4 are known for their roles in the acquisition of vascular endothelial cell fate, whose synergistic transcriptional core contained these TFs^{21, 22}. The synergistic core of the mouse enteroendocrine cell contained Neurog3, Neurog1, Insm1, Nkx2-2, Foxa1, Foxa2, Pax4, and Lmx1a, all of which are known to be essential for the functioning of the cell^{23,24,25,26,27,28,29}. We also examined the synergistic transcriptional core of the human subpopulations for which only mouse functional data is available, such as hProgFPM and hDA2 neurons thought to give rise to substantia nigra DA neurons postnatally⁴. The synergistic transcriptional core of hDA2 neurons identified NR4A2, a nuclear receptor that controls mDA neuron identity and survival in mice³⁰, and FOXA1, a TF that together with FOXA2 contros mDA identity and neurogenesis in mice³¹. Finally, the hProgFPM synergistic core included TFs previously identified in the mouse midbrain floor plate and important for mDA neuron development in mice, such as FOXA2, OTX2, LMX1A^{10, 32,33,34} which have not been previously recognized as the core of hProgFPM (Table 1; Supplementary Data 1). Overall, these examples demonstrate that synergistic transcriptional cores identified by TransSyn recapitulated known TFs controlling cell type/subpopulation identities along with their known functional, potentially synergistic interactions.

Evaluation of TransSyn performance

For an unbiased assessment of TransSyn performance, we calculated the percentage of cell subpopulations where at least one predicted TF has previously been experimentally validated to define the identity of that cell subpopulation. This showed that TransSyn could predict at least one such TF for 85 % of the cell subpopulations, for which at least one experimentally validated TF is known. We followed this criterion since the current knowledge of experimentally validated TFs is not complete (i.e., previously not tested) and includes TFs which are not classified as identity TFs according to our definition. The compiled list of TFs known to maintain cell subpopulation identities is shown in (Supplementary Data 2).

Importantly, we observed that pair-wise mutual information (MI) was not able to capture all the interactions among TFs in synergistic cores, supporting that these TFs interact synergistically rather than pair-wise (Supplementary Data 3). For example, this was observed in the case of interaction between the three plutipotency TFs (NANOG, POU5F1, and SOX2) in hESC, and Runx1, Fli1, Gata2, and Tal1 in the blood progenitor subpopulation described above, due to the multifactorial nature of the transcriptional regulatory mechanism. On the contrary, a set of TFs exhibiting pair-wise interactions among themselves does not necessarily display a multiple synergistic interaction, and therefore will not represent a synergistic transcriptional core. To show this, we performed a topological analysis of subpopulation specific GRNs inferred from pair-wise co-expression to identify top 10 subpopulation-specific hubs that could potentially be TFs that define subpopulation identities. Results showed that only a few known TFs were recovered as unique hubs (Table 2; Supplementary Data 4), indicating that transcriptional synergy is more suited for unraveling TFs that define subpopulation identities.

Table 2 Unique top 10 hub TFs in GRNs for the example subpopulations in Table 1. Known identity TFs are in bold.

Full size table

Next, we compared the performance of TransSyn to a method for identifying candidate identity TFs for bulk cell/tissue types using Jensen-Shannon Divergence (JSD)⁷. Since JSD was computed from bulk microarray data in this earlier study, we computed JSD using the average single-cell gene expression in each cell subpopulation. Results showed that in general, JSD predicted at least one TF in 33 % cell subpopulations in contrast to 85% achieved by TransSyn (Table 1A; Supplementary Data 5). This result shows that TransSyn is more suited for identifying TFs that define closely related cell subpopulations. A systematic comparison with other tools, such as CellNet⁵ and Mogrify⁶, was not possible, since they do not currently consider user input single-cell RNA-seq data. Indeed, their built-in cell/tissue types exhibit a very limited overlap with the cell subpopulations we collected in this study. In particular, CellNet shares no overlap, while Mogrify shares a very limited overlap (Supplementary Table 1). The reprogramming factors identified by the latter for ESCs, NSCs, pancreatic mast cells and endothelial cells contained known identity TFs, whereas the factors for neurons from NSCs and between lung fibroblast and bronchial epithelial cells did not contain any known TF.

Experimental validation of predicted identity TFs

Finally, to demonstrate the usefulness of TransSyn, we carried out an experiment to shift the identity of hidbrain hNES cell line (SAI2)³⁵ to that of midbrain hProgFPM cells⁴. We first generated single-cell RNA-seq data of hNES cells, and found that its synergistic transcriptional core is quite different from that of hProgFPM cells (Fig. 2a). Analysis of the TFs required to convert hNES cells into hProgFPM cells identified OTX2, LMX1A, and FOXA2 (Fig. 2a). Since OTX2 is known to induce LMX1A³⁶, the conversion was performed by inducing the expression of the other two TFs, OTX2, and FOXA2. This was achieved by treating hNES cells during proliferation (FGF2+EGF) with two factors: (i) The small molecule smoothened agonist (SAG, 500 nM), which directly activates Shh signaling³⁷ and induces FOXA2³⁸. (ii) The Wnt antagonist Dickoppf1 (Dkk1, 150 ng/ml), to reduce Wnt/β-catenin signaling to the levels required to induce OTX2¹⁰ and midbrain dopamine neuron development³⁹ (Fig. 2b). Our expectation was that SAG would ventralize hNES cells and change their baso-lateral identity³⁵ into floor plate cells expressing FOXA2; and Dkk1 would anteriorize hindbrain cells expressing GBX2 into midbrain cells expressing OTX2⁴⁰. Our results show that treatment of proliferating hNES cells with SAG and Dkk1 did not change the levels of the common midbrain-hindbrain TFs engrailed1 (EN1) and PAX2 (Fig. 2c, d), but increased the ratio of OTX2:GBX2 expression (Fig. 2e), indicating efficient anteriorization and acquisition of midbrain identity. In addition, we also observed increased levels of FOXA2 (Fig. 2f) and decreased levels of lateral genes, such as PAX6 and IRX3 (Fig. 2g, h), indicative of efficient ventralization. These results were also confirmed by immunohistochemistry, which showed increased numbers of OTX2-positive cells (Fig. 2i).

To further confirm that the identity of the hNES cells had become that of hProgFPM cells, we tested their function, as assessed by their capacity to induce the expression of LMX1A at a later time-point and differentiate into midbrain DA neurons, reasoning that cells with the correct identity will be more efficient at generating DA neurons than the parental cells. Differentiation involved the removal of mitogens (FGF2 and EGF), as well as treatment with well-know midbrain patterning and differentiation factors such as Shh, Wnt5a, BDNF, GDNF, TGFβ3, and Wnt5a (reviewed in ref.⁴⁰). In addition, we tested whether treatment with FGF8, a factor that was recently found to improve midbrain patterning and differentiation in human ES cells⁴¹ was capable of further improving our protocol (Fig. 3a). We found that while both protocols strongly increased OTX2 and decreased GBX2 expression, only the protocol without FGF8 significantly increased LMX1A expression at day 8, as assessed by RT-qPCR (Fig. 3b). Similarly, both protocols increased the levels of NR4A2 and SLC6A3, but TH expression was only significantly increased by the SAG and Dkk1 protocol (Fig. 3c). Accordingly, while control unconverted cells were only capable of giving rise to rare and weak TH+ cells, cells differentiated after SAG and Dkk1 treatment gave rise to abundant TH+ neurons (Fig. 3d), and significant increased in the number of OTX2+ cells (Fig. 3e) and of TH+ cells (Fig. 3f). Moreover, TH+ cells were also LMX1A+, NR4A2+, and PBX1+ (Fig. 3g–i), confirming their midbrain identity^{30, 33, 46}. In addition, TH+ cells expressed the mature neuronal marker MAP2 (Fig. 3j) and some of them were found to acquire a mature neuronal morphology with long processes and varicosities and bipolar morphology, typical of mDA neurons (Fig. 3k). Thus, our results show that by switching the identity of hNES to hProgFPM prior to differentiation, it is possible to rapidly differentiate hNES into DA neurons.

Discussion

In this study we postulated that cell subpopulation identity is determined by TFs that exhibit transcriptional synergy. Based on this proposition, we developed a computational method that dynamically searches for optimal synegistic transcriptional cores using an information theoretic measure of synergy computed from single-cell RNA-seq data. The predicted transcriptional cores recapitulatd known identity TFs in 85% of the tested cases and known synergistic TF interactions that relate to cell identity. Thus, the concept of transcriptional synergy employed in TransSyn represents a novel approach to specifically identifying transcriptional cores defining cell subpopulation identities. Following the experimental validation of the predicted identity transcriptional core of hProgFPM cells, we compiled a list of TFs whose up-/down-regulation may convert one cell subpopulation into another for 3786 pairs of initial and target cell subpopulations (Supplementary Data 6). Further validation of these transcriptional cores will reinforce the generality of the method. Importantly, unlike previously introduced methods, TransSyn does not require pre-compiled reference single-cell datasets, which are unavailable for newly identified cell subpopulations. In addition, TransSyn does not rely on GRN inference and analysis, which could be a bottleneck for accurate predictions of identity transcriptional cores. In summary, such unbiased identification of synergistic transcriptional cores may facilitate the development of general strategies for cell subpopulation conversions, opening up novel functional applications in regenerative medicine, such as the generation of DA neurons for Parkinson’s disease.

Methods

Single-cell RNA-seq data

Single-cell RNA-seq data used in this study were obtained for the following biological systems; the mouse datasets for lung, striatum, cortex, and hippocampus, quiescent, and active NSCs, intestine, circulating pancreatic tumor cells, hair follicles, and gastrulating embryo, and the human datasets for midbrain, CD127+ lymphoid cells, pancreas, ovarian cancer, germline cells, and in vitro hESCs. The reference to each dataset is described in (Supplementary Data 1). We used the same subpopulation classifications defined in the respective original studies. We also analyzed other datasets not listed here, however, they did not have an enough number of expressed TFs in the majority of cell subpopulations, and were therefore discarded. In addition, synergistic transcriptional cores for cell subpopulations that were either “undefined” or with less than three cells were not considered. We did not reprocess each raw data and same gene expression values that were used in the original studies were also used in this study. TFs were considered “expressed” if their expression values were ≧1 in RNA-seq FPKM/RPKM/TPM values, ≧10 in normalized read counts, or ≧1 in UMI counts. TFs below these thresholds were considered “not expressed”. Exceptionally, the expression cutoff of 10 was used for the hESC dataset, since setting it to 1 resulted in too many expressed TFs and the subsequent computation became infeasible.

Identification of most frequently expressed TFs

The definition of TFs was obtained from the AnimalTF database⁴². The fraction of cells expressing each TF was computed in each subpopulation and the top 10% most frequently expressed TFs were shortlisted for further analyses. Among these TFs, we discarded those that were not expressed in more than 30% of cells. For the La Manno et al., 2016, dataset⁴, the binarized expression status estimated in the original study was used. Since this filtering retained many TFs that were expressed at very low intensity, TFs with mean UMI count <1 were further discarded. Since the subsequent computation becomes infeasible for standard desktop computers if the number of TFs is more than 150, in these cases TFs with highest coefficient of variation were discarded to make the number of TFs ≦150.

MMI computation

In each cell subpopulation MMI¹¹ among the most frequently expressed TFs was computed by:

$${\mathrm{MMI}}\left( S \right) = - \mathop {\sum }\limits_{T \subseteq S} \left( { - 1} \right)^{\left| T \right|}H\left( T \right),$$

where $S = \left\{ {X_{1,}X_2,...X_n} \right\}$, $T$ is a subset of $S$, and $\left| T \right|$ indicates the number of variables in this subset. In the current study these variables are discretized gene expression values of TFs. This equation becomes MI if only two variables are considered. In the case of three variables the equation can be written as:

$${\mathrm{MMI}}\left( {X;Y;Z} \right) = H\left( X \right) + H\left( Y \right) + H\left( Z \right) - H\left( {X,Y} \right) - H\left( {Y,Z} \right) \\ - H\left( {Z,X} \right) + H\left( {X,Y,Z} \right)$$

To compute Shannon’s entropy, gene expression values were first log10-transformed. Zero gene expression values were converted into 1 prior to the transformation. This value was then discretized within each cell subpopulation using the Freedman–Diaconis rule implemented in the R nclass.FD function and Shannon’s entropy of each TF was computed on these discretized values. The input value for the nclass.FD function was set to the number of cells +1 for FPKM/RPKM/TPM values, and normalized read counts, while the number of cells +6 was used for UMI counts. The range of gene expression value was set between 0 and maximum value of a given cell subpopulation. Since the bin size for each TF is different, the entropy was normalized by the theoretical maximum entropy (i.e., entropy when all bins contain an equal number of variable) to enable a direct comparison between different TF entropies. The MMI was then computed using all cells in the entire population of a given dataset except for the ones in the subpopulation for which MMI is being computed. As described above, the joint entropy was also normalized by the theoretical maximum entropy prior to MMI computation.

Dynamic search for synergistic transcriptional cores

MMI was first computed for all combinations of three TFs and the top one percent lowest MMI (i.e., synergistic) combinations were taken. Then, these TF combinations were ranked by TC defined as:

$${\mathrm{TC}}\left( S \right) = \left( {\mathop {\sum }\limits_{X_i \subseteq S} H\left( {X_i} \right)} \right) - H\left( S \right),$$

where $S = \{ X_{1,}X_2,...X_n\}$. TC measures the interaction strengths (MI) shared among all subsets of the variables within a combination, and is more appropriate for comparing interaction strengths between different combinations than MMI¹², which measures the information gain from the previous seed combination. Then, top one percent highest TC combinations were used as initial seeds for the subsequent search for higher-level synergistic combinations of TFs. To this end, new TFs were added to each seed combination one by one and MMI for the new combination was computed. Then, the combinations that showed lower (less than 0.05) MMI than the seed were taken to the next iteration. For example, if {A, B, C, D, E, F, G, H} were the selected TFs in a given cell subpopulation and {A, B, C} was a seed combination, then MMI of all 4 TF combinations, {A, B, C, D}, {A, B, C, E}, {A, B, C, F}, {A, B, C, G}, and {A, B, C, H}, were computed and if the difference in MMI between the new combination and seed was negative, then that new combination was kept. Then, these new, more synergistic TF combinations were again ranked by TC and the top 10 combinations were used as seeds for the identification of best 5 TF combinations next, and so on. This procedure was continued until no new combination is more synergistic than the seed. We also terminated the procedure when the number of TFs reached 15, since continuing with more than this number was often computationally impractical. We think this operation is acceptable, since usually at this point most TFs are shared among different combinations. Once the search is terminated, MMI for all combinations of the top 20 best TC combinations is computed and if there is more synergistic combination(s), then those combinations are ranked by TC as the final synergistic transcriptional cores. If more than one top combination (i.e., ties) is present, they are ranked by the highest summed mean gene expression and the top three combinations were kept as the final synergistic transcriptional cores. For the identification of cell conversion TFs, TFs in the synergistic transcriptional core of a target cell subpopulation were ranked by the mean gene expression fold change between the target cell subpopulation and starting cell subpopulation. The main part of TransSyn was written in C++, which was wrapped in R using the Rcpp package.

MI computation between TFs

Pair-wise MI was computed for TF pairs in transcriptional synergistic cores, in which at least two TF are known to maintain that cell subpopulation. The gene expression values were first log2-transformed and then discretized within each cell subpopulation using the Freedman–Diaconis rule, as described above. Then Shannon’s entropy of each TF and joint entropies of each pair of TFs were computed on these discretized values. MI was then computed by:

$${\mathrm{MI}}\left( {X;Y} \right) = H\left( X \right) + H\left( Y \right) - H\left( {X,Y} \right),$$

The statistical significance of each edge was computed by a t-test against a null distribution formed by randomizing data 50 times and edges with the top 1% lowest p-value were kept as the final edges.

GRN hub analysis

A GRN for each cell subpopulation was inferred using the corresponding cell subpopulation single-cell RNA-seq data with four different algorithms, PCC, SCC, MRNET⁴³, and random forest-based method (GENIE3⁴⁴). The default parameters were used for GENIE3. For RNA-seq FPKM/RPKM/TPM values and RNA-seq normalized read counts, the values were log2-transformed prior to the inference. No transformation was applied to UMI counts.

JSD computation for TFs

For each TF, JSD was computed between an ideal gene expression vector and an observed gene expression vector, as was previously performed in⁷. The ideal gene expression vector was formed by putting 1 to the query cell subpopulation and 0 to all other subpopulations within a dataset. The observed gene expression vector was formed by computing the average gene expression for each subpopulation and normalizing each value by the sum of the average gene expression values of all the subpopulations. The top 10 TFs were taken as the predicted identity TFs.

Cultivation of Lt-NES SAI2 cells

In our study we used the Long-term self-renewing neuroepithelial-like stem cells (Lt-NES) SAI2 line generated from human hindbrain fetal tissue³⁵. Mycoplasma-free cells have been kept in proliferation according to previously described protocols⁴⁵, in 6-well plates coated with poly-L-ornithine (1:5 in water; Sigma) and laminin (1:500 in water, Invitrogen), using maintenance media based on DMEM F12 Glutamax Medium (GIBCO, LifeTechnologies) supplemented with N2 (1:100, GIBCO, LifeTechnologies), B27 (1:1000, GIBCO, LifeTechnologies), and the growth factors hEGF (10 ng/ml, R&D) and FGF2 (10 ng/ml, R&D). To modify the identity of Lt-NES, cells were treated for 48 h with SAG (500 nM, Tocris) and Dkk1 (150 ng/ml, R&D) in the proliferation media. Treated and non-treated cells were compared.

For differentiation experiments, Lt-NES cells treated as above for 48 h were seeded at a density of 100.000 cells in 48-well plates coated with PLO and laminin. Cells were differentiated for 6 days in following the protocol described in ref.⁴⁶ with some modifications: Cells were patterned for 2 days in media containing N2 Supplement (1:100), B27 (1:1,000), Shh (200 ng/ml, R&D) and Wnt5a (100 ng/ml) with or without FGF8B (100 ng/ml, PeproTech). Cells were subsequently differentiated for 4 days on media containing N2 (1:100) and B27 (1:100). During the first 2 days in GDNF (20 ng/ml, R&D) and BDNF (20 ng/ml, R&D) and the last 2 days in GDNF (20 ng/ml, R&D), BDNF (20 ng/ml, R&D), dcAMP (0,5 mM, Sigma), Ascorbic Acid (200 µM, Sigma), and TGFβ3 (2 ng/ml, R&D).

hNES single-cell RNA-seq data

Single cell RNA-seq of undifferentiated Lt-NES was obtained and analyzedfrom GSE114670.

RT-qPCR

RNA was extracted form Lt-NES SAI2 cells using RNeASY mRNA isolation system (Qiagen) according to the manufacturer’s instructions and treated with DNAse on-column protocol. 200–500 ng of total RNA were reverse-transcribed using the Superscript II kit (Invitrogen). The reverse-transcribed cDNA was amplified using Fast SYBRGREEN (Applied Biosystems) in a StepONE plus real time-qPCR machine (Applied Biosystems). Outliers were detected by using the absolute deviation from the median, statistical significance was measured with the Welch two sample two tailed t-test and Bonferroni correction in case of multiple testing. Stars indicate *P ≤ 0,05; **P ≤ 0,01; ***P ≤ 0,001.

Primers used in our analysis:

SLC6A3 Forward 5′---3′ ACCTTCCTCCTGTCCCTGTT

Reverse 3′---5′ CACCATAGAACCAGGCCACT

EN1 Forward 5′---3′ CGTGGCTTACTCCCCATTTA

Reverse 3′---5′ TCTCGCTGTCTCTCCCTCTC

FOXA2 Forward 5′---3′ TTCAGGCCCGGCTAACTCT

Reverse 3′---5′ AGTCTCGACCCCCACTTGCT

GAPDH Forward 5′---3′ TTGAGGTCAATGAAGGGGTC

Reverse 3′---5’ GAAGGTGAAGGTCGGAGTCA

GBX2 Forward 5′---3′ GTTCCCGCCGTCGCTGATGAT

Reverse 3′---5′ GCCGGTGTAGACGAAATGGCCG

IRX3Forward 5′---3′ CTGACGAGGAGGGAAACGCTTA

Reverse 3′---5′ GAGCTCCTCCTCCTCCAGCTCT

LMX1AForward 5′---3′ GATCCCTTCCGACAGGGTCTC

Reverse 3′---5′ GGTTTCCCACTCTGGACTGC

NR4A2Forward 5′---3′ AGTCTGATCAGTGCCCTC

Reverse 3′---5′ CCCCATTGCAAAAGATGAGT

OTX2 Forward 5′---3′ ACAAGTGGCCAATTCACTCC

Reverse 3′---5′ GAGGTGGACAAGGGATCTGA

PAX2Forward 5′---3′ TAGACTGCGGACTGGGGTCTTC

Reverse 3′---5′ GGTTCTTACCACCGGCAGATTG

PAX6 Forward 5′---3′ TGGTATTCTCTCCCCCTCCT

Reverse 3′---5′ TAAGGATGTTGAACGGGCAG

THForward 5′---3′ ACTGGTTCACGGTGGAGTTC

Reverse 3′---5′ TCTCAGGCTCCTCAGACAGG

SLC18A2Forward 5′---3′ CACTGCCTCCATCTCAGACA

Reverse 3′---5′ CCGGTGACCATAGTCGAGTT

Immunocytochemistry on differentiated hNES cells

For immunocytochemical analysis, cells were fixed for 20 min at room temperature in 4% paraformaldehyde (PFA) in PBS, permeabilized and blocked for 60 min in PBS containing 0.3% Triton X-100, 0.1% BSA and 10% normal donkey serum (PBTA-NDS). Then they were incubated overnight at 4 °C in PBTA-NDS with different antibodies: rabbit TH (1:1000, Pel Freeze, P4010-0), mouse TH (1:250, Immunostar, 22941) or sheep TH (1:250, Novus, NB300-110), mouse MAP2 (1:100, Sigma, M4403), mouse PBX1 (1:200, Santa Cruz, SC-101851), rabbit NR4A2 (1:200, Santa Cruz, SC-990), rabbit LMX1A (1:4000, Millipore, AB10533), and goat OTX2 (1:1000, Bio-techne, AF1979). Next day the cells were washed three times with PBS and incubated for 2 h at room temperature with Alexa Fluor secondary antibodies (1:500, Invitrogen) 647 (A31571), 555 (A31572, A21432, A21436, 488 (A11035, A21467, A21206, A11015), and 4’,6-Diamidino-2-phenylindole dihydrochloride (DAPI, Sigma, D8417) in PBTA-NDS. Microphotographs were taken with a Zeiss LSM800 confocal microscope (CLICK facility, Karolinska Institute) using the same settings. Cell counts were performed in a blinded fashion in 3 independent experiments, and 6-9 randomly selected fields/condition. Control and experimental images were processed linearly, in the same way, using Fiji software (ImageJ version 1.51t) and Photoshop CS5 (Adobe System Inc.).

Code availability

TransSyn is freely available at https://sourceforge.net/projects/transsyn/.

Data availability

The single-cell RNA seq data of undifferentiated Lt-NES is available at GEO: GSE114670. The rest of the data supporting the conclusions of this study are available from the correspoing author upon request.

References

Morris, S. A. & Daley, G. Q. A blueprint for engineering cell fate: current technologies to reprogram cell identity. Cell Res. 23, 33–48 (2013).
Article PubMed PubMed Central CAS Google Scholar
Wagner, A., Regev, A. & Yosef, N. Revealing the vectors of cellular identity with single-cell genomics. Nat. Biotechnol. 34, 1145–1160 (2016).
Article PubMed PubMed Central CAS Google Scholar
Trapnell, C. Defining cell types and states with single-cell genomics. Genome Res. 25, 1491–1498 (2015).
Article PubMed PubMed Central CAS Google Scholar
La Manno, G. et al. Molecular diversity of midbrain development in mouse, human, and stem cells. Cell 167, 566–580 (2016). e519.
Article PubMed PubMed Central CAS Google Scholar
Cahan, P. et al. CellNet: network biology applied to stem cell engineering. Cell 158, 903–915 (2014).
Article PubMed PubMed Central CAS Google Scholar
Rackham, O. J. et al. A predictive computational framework for direct reprogramming between human cell types. Nat. Genet. 48, 331–335 (2016).
Article PubMed CAS Google Scholar
D’Alessio, A. C. et al. A systematic approach to identify candidate transcription factors that control cell identity. Stem Cell Rep. 5, 763–775 (2015).
Article CAS Google Scholar
Spitz, F. & Furlong, E. E. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626 (2012).
Article PubMed CAS Google Scholar
Boyer, L. A. et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956 (2005).
Article PubMed PubMed Central CAS Google Scholar
Chung, S. et al. Wnt1-lmx1a forms a novel autoregulatory loop and controls midbrain dopaminergic differentiation synergistically with the SHH-FoxA2 pathway. Cell Stem Cell 5, 646–658 (2009).
Article PubMed PubMed Central CAS Google Scholar
Bell, A. J. The co-information lattice. 4th International Symposium Independent Component Analysis and Blind Source Separation, 921 (2003).
Timme, N., Alford, W., Flecker, B. & Beggs, J. M. Synergy, redundancy, and multivariate information measures: an experimentalist’s perspective. J. Comput. Neurosci. 36, 119–140 (2014).
Article MathSciNet PubMed MATH Google Scholar
Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).
Article PubMed CAS Google Scholar
Scialdone, A. et al. Resolving early mesoderm diversification through single-cell expression profiling. Nature 535, 289–293 (2016).
Article PubMed PubMed Central ADS CAS Google Scholar
Wilson, N. K. et al. Combinatorial transcriptional control in blood stem/progenitor cells: genome-wide analysis of ten major transcriptional regulators. Cell Stem Cell 7, 532–544 (2010).
Article PubMed CAS Google Scholar
Mazzoni, E. O. et al. Synergistic binding of transcription factors to cell-specific enhancers programs motor neuron identity. Nat. Neurosci. 16, 1219–1227 (2013).
Article PubMed CAS Google Scholar
Vassen, L. et al. Growth factor independence 1b (gfi1b) is important for the maturation of erythroid cells and the regulation of embryonic globin expression. PLoS ONE 9, e96636 (2014).
Article PubMed PubMed Central ADS CAS Google Scholar
Ross, J., Mavoungou, L., Bresnick, E. H. & Milot, E. GATA-1 utilizes Ikaros and polycomb repressive complex 2 to suppress Hes1 and to promote erythropoiesis. Mol. Cell Biol. 32, 3624–3638 (2012).
Article PubMed PubMed Central CAS Google Scholar
Chen, W. S. et al. Disruption of the HNF-4 gene, expressed in visceral endoderm, leads to cell death in embryonic ectoderm and impaired gastrulation of mouse embryos. Genes Dev. 8, 2466–2477 (1994).
Article PubMed CAS Google Scholar
Coffinier, C., Thepot, D., Babinet, C., Yaniv, M. & Barra, J. Essential role for the homeoprotein vHNF1/HNF1beta in visceral endoderm differentiation. Development 126, 4785–4794 (1999).
PubMed CAS Google Scholar
Das, J. K., Voelkel, N. F. & Felty, Q. ID3 contributes to the acquisition of molecular stem cell-like signature in microvascular endothelial cells: its implication for understanding microvascular diseases. Microvasc. Res. 98, 126–138 (2015).
Article PubMed PubMed Central CAS Google Scholar
Suzuki, T., Aizawa, K., Matsumura, T. & Nagai, R. Vascular implications of the Kruppel-like family of transcription factors. Arterioscler. Thromb. Vasc. Biol. 25, 1135–1141 (2005).
Article PubMed CAS Google Scholar
Desai, S. et al. Nkx2.2 regulates cell fate choice in the enteroendocrine cell lineages of the intestine. Dev. Biol. 313, 58–66 (2008).
Article PubMed CAS Google Scholar
Bjerknes, M. & Cheng, H. Neurogenin 3 and the enteroendocrine cell lineage in the adult mouse small intestinal epithelium. Dev. Biol. 300, 722–735 (2006).
Article PubMed CAS Google Scholar
Li, H. J., Ray, S. K., Singh, N. K., Johnston, B. & Leiter, A. B. Basic helix-loop-helix transcription factors and enteroendocrine cell differentiation. Diabetes Obes. Metab. 13, Suppl. 1 5–12 (2011).
Article PubMed PubMed Central CAS Google Scholar
Gierl, M. S., Karoulias, N., Wende, H., Strehle, M. & Birchmeier, C. The zinc-finger factor Insm1 (IA-1) is essential for the development of pancreatic beta cells and intestinal endocrine cells. Genes Dev. 20, 2465–2478 (2006).
Article PubMed PubMed Central CAS Google Scholar
Ye, D. Z. & Kaestner, K. H. Foxa1 and Foxa2 control the differentiation of goblet and enteroendocrine L- and D-cells in mice. Gastroenterology 137, 2052–2062 (2009).
Article PubMed PubMed Central CAS Google Scholar
Larsson, L. I., St-Onge, L., Hougaard, D. M., Sosa-Pineda, B. & Gruss, P. Pax 4 and 6 regulate gastrointestinal endocrine cell development. Mech. Dev. 79, 153–159 (1998).
Article PubMed CAS Google Scholar
Gross, S. et al. The novel enterochromaffin marker Lmx1a regulates serotonin biosynthesis in enteroendocrine cell lineages downstream of Nkx2.2. Development 143, 2616–2628 (2016).
Article PubMed PubMed Central CAS Google Scholar
Zetterstrom, R. H. et al. Dopamine neuron agenesis in Nurr1-deficient mice. Science 276, 248–250 (1997).
Article PubMed CAS Google Scholar
Ferri, A. L. et al. Foxa1 and Foxa2 regulate multiple phases of midbrain dopaminergic neuron development in a dosage-dependent manner. Development 134, 2761–2769 (2007).
Article PubMed CAS Google Scholar
Puelles, E. et al. Otx2 regulates the extent, identity and fate of neuronal progenitor domains in the ventral midbrain. Development 131, 2037–2048 (2004).
Article PubMed CAS Google Scholar
Andersson, E. et al. Identification of intrinsic determinants of midbrain dopamine neurons. Cell 124, 393–405 (2006).
Article PubMed CAS Google Scholar
Sasaki, H. & Hogan, B. L. HNF-3 beta as a regulator of floor plate development. Cell 76, 103–115 (1994).
Article PubMed CAS Google Scholar
Tailor, J. et al. Stem cells expanded from the human embryonic hindbrain stably retain regional specification and high neurogenic potency. J. Neurosci. 33, 12407–12422 (2013).
Article PubMed PubMed Central CAS Google Scholar
Ono, Y. et al. Differences in neurogenic potential in floor plate cells along an anteroposterior location: midbrain dopaminergic neurons originate from mesencephalic floor plate cells. Development 134, 3213–3225 (2007).
Article PubMed CAS Google Scholar
Chen, J. K., Taipale, J., Young, K. E., Maiti, T. & Beachy, P. A. Small molecule modulation of Smoothened activity. Proc. Natl Acad. Sci. USA 99, 14071–14076 (2002).
Article PubMed PubMed Central ADS CAS Google Scholar
Denham, M. et al. Glycogen synthase kinase 3beta and activin/nodal inhibition in human embryonic stem cells induces a pre-neuroepithelial state that is required for specification to a floor plate cell lineage. Stem Cells 30, 2400–2411 (2012).
Article PubMed PubMed Central CAS Google Scholar
Ribeiro, D. et al. Dkk1 regulates ventral midbrain dopaminergic differentiation and morphogenesis. PLoS ONE 6, e15786 (2011).
Article PubMed PubMed Central ADS CAS Google Scholar
Arenas, E., Denham, M. & Villaescusa, J. C. How to make a midbrain dopaminergic neuron. Development 142, 1918–1936 (2015).
Article PubMed CAS Google Scholar
Kirkeby, A. et al. Predictive markers guide differentiation to improve graft outcome in clinical translation of hesc-based therapy for Parkinson’s disease. Cell Stem Cell 20, 135–148 (2017).
Article PubMed PubMed Central CAS Google Scholar
Zhang, H. M. et al. AnimalTFDB: a comprehensive animal transcription factor database. Nucleic Acids Res. 40, D144–D149 (2012).
Article PubMed CAS Google Scholar
Meyer, P. E., Kontos, K., Lafitte, F. & Bontempi, G. Information-theoretic inference of large transcriptional regulatory networks. EURASIP J. Bioinform. Syst. Biol. 1, 79879 (2007).
Huynh-Thu, V. A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, e12776 (2010).
Falk, A. et al. Capture of neuroepithelial-like stem cells from pluripotent stem cells provides a versatile system for in vitro production of human neurons. PLoS ONE 7, e29597 (2012).
Article PubMed PubMed Central ADS CAS Google Scholar
Villaescusa, J. C. et al. A PBX1 transcriptional network controls dopaminergic neuron development and is impaired in Parkinson’s disease. EMBO J. 35, 1963–1978 (2016).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

S.O. is supported by an FNR CORE grant (C15/BM/10397420), S.R. by University of Luxembourg IRP Grant (R-AGR-3227-11), E.M.T. by a fellowship from the Swedish Research Council. This project was supported by the Swedish Research Council (VR 2016-01526), Swedish Foundation for Strategic Research (SRL program and SB16-0065), European Commission (NeuroStemcellRepair), Karolinska Institutet, Cancerfonden (CAN 2016/572), and Hjärnfonden (FO2015:0202). We would also like to thank Gioele La Manno and Peter Lönneberg for help with single cell data; and the Knut and Alice Wallenberg Foundation for support to the CLICK imaging facility at KI.

Author information

Enrique M. Toledo
Present address: Novo Nordisk Research Centre Oxford (NNRCO), Cellular and Systems Genomics, Oxford, OX3 7BN, United Kingdom
These authors contributed equally: Satoshi Okawa, Carmen Saltó.

Authors and Affiliations

Computational Biology Group, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6, avenue du Swing, L-4367, Belvaux, Luxembourg
Satoshi Okawa, Srikanth Ravichandran & Antonio del Sol
Department of Medical Biochemistry and Biophysics, Laboratory of Molecular Neurobiology, Biomedicum 6C, Solnavägen 9, Karolinska Institutet, 17177, Stockholm, Sweden
Carmen Saltó, Shanzheng Yang, Enrique M. Toledo & Ernest Arenas
Moscow Institute of Physics and Technology, Dolgoprudny, 141701, Russia
Antonio del Sol

Authors

Satoshi Okawa
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Saltó
View author publications
You can also search for this author in PubMed Google Scholar
Srikanth Ravichandran
View author publications
You can also search for this author in PubMed Google Scholar
Shanzheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Enrique M. Toledo
View author publications
You can also search for this author in PubMed Google Scholar
Ernest Arenas
View author publications
You can also search for this author in PubMed Google Scholar
Antonio del Sol
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.d.S. conceived the overall study and designed its computational part, S.O. and S.R. developed the computational method, compiled data sets and performed the analysis, E.A. designed the cell culture experiments. C.S., S.Y., and E.M.T. performed the experiments, S.O., C.S., S.R., E.A., and A.d.S. wrote the manuscript.

Corresponding author

Correspondence to Antonio del Sol.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Okawa, S., Saltó, C., Ravichandran, S. et al. Transcriptional synergy as an emergent property defining cell subpopulation identity enables population shift. Nat Commun 9, 2595 (2018). https://doi.org/10.1038/s41467-018-05016-8

Download citation

Received: 07 August 2017
Accepted: 06 June 2018
Published: 03 July 2018
DOI: https://doi.org/10.1038/s41467-018-05016-8

This article is cited by

Neogenin suppresses tumor progression and metastasis via inhibiting Merlin/YAP signaling
- Xiaohan Hu
- Li Li
- Yunyun Xu
Cell Death Discovery (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.