Gene-pair expression signatures reveal lineage control

Journal name:
Nature Methods
Year published:
Published online


The distinct cell types of multicellular organisms arise owing to constraints imposed by gene regulatory networks on the collective change of gene expression across the genome, creating self-stabilizing expression states, or attractors. We curated human expression data comprising 166 cell types and 2,602 transcription-regulating genes and developed a data-driven method for identifying putative determinants of cell fate built around the concept of expression reversal of gene pairs, such as those participating in toggle-switch circuits. This approach allows us to organize the cell types into their ontogenic lineage relationships. Our method identifies genes in regulatory circuits that control neuronal fate, pluripotency and blood cell differentiation, and it may be useful for prioritizing candidate factors for direct conversion of cell fate.

At a glance


  1. Gene-pair expression-reversal analysis.
    Figure 1: Gene-pair expression-reversal analysis.

    (a) Ranks of two hypothetical genes g and g′ scaled by the total number of genes, plotted from microarray samples assigned to three hypothetical cell types. δ, normalized mean rank difference of two genes. (b) Gene pair–reversal plot. The reversal behavior of n = 3 cell types for the {g, g′} gene pair is shown as an n × n symmetric matrix. The Δ value, indicating the extent of reversal behavior, is represented by the color in the heat map; gray corresponds to Δ = 0, red tones indicate that the configuration changes from g double greater than g′ in the first cell type to g less double g′ in the second cell type, and opposite reversals are indicated in blue. (c) Reversal participation. The Ψ value for gene g quantifies reversal participation, a measure of the number and strength of reversals in which the gene participates. The matrix on the left displays a cell portrait in which rows correspond to the reversal participation scores of genes for those pairwise cell type comparisons involving cell type 12 (comparisons to self are indicated in dark blue). The portraits are sorted to reveal highest-scoring genes on top. Alternatively, for assessing reversal participation of a particular gene across all cell types (here, pairwise comparisons of 32 hypothetical cell types) the Ψ values can be visualized as gene portraits (note the corresponding rows in the two matrices: the row showing reversal participation of gene g in cell type 12 on the left matches the row for cell type 12 in the gene portrait shown on the right).

  2. Reversal participation analysis in ESCs.
    Figure 2: Reversal participation analysis in ESCs.

    (a) First 100 rows (of 2,602 TF genes evaluated) of the ESC portrait; the names of top 20 ESC-specific transcription-regulating genes are indicated (refer to Supplementary Table 3 for the order of cell types in columns). (b) Plots showing gene-reversal portraits, ENCODE12 RNA-seq data (R) and ChIP-seq histone methylation data (C) for the top 20 ESC-specific genes. H3K4me3 data are shown for six ENCODE cell types: human ESC line H1 (H1 ES), breast epithelial cell (HMEC), skeletal muscle myoblast (HSMM), umbilical vein endothelial cell (HUVEC), epithelial keratinocyte (NHEK) and lung fibroblast (NHLF). RNA-seq data are shown for H1 ES, HUVEC and NHEK cells. Ψ, reversal participation score.

  3. Reversal participation analysis of a candidate gene set for neuronal specification.
    Figure 3: Reversal participation analysis of a candidate gene set for neuronal specification.

    The reversal participation (Ψ) portraits of 19 candidate genes for inducing fibroblast-to-neuron conversion14 are shown. ASCL1 was previously found to be most potent on its own for this induction. The ordering of the portraits reflects previous experimental success with induction of neuronal fate in combination with ASCL1. The gray bar indicates the location (rows) of neuronal cells in each portrait. The combination of genes indicated in bold resulted in the best reprogramming efficiency.

  4. Identification of reversal pairs in lineage splits of the blood system.
    Figure 4: Identification of reversal pairs in lineage splits of the blood system.

    (a) The tree shows early splits in the blood lineage. Lineage-determining TF gene pairs at binary splits are expected to follow the reversal pattern shown in the idealized gene pair–reversal plots for pairwise comparisons of the hematopoietic stem cell (HSC), erythroid, myeloid and lymphoid cell types. An ideal pair will also show no reversals for other cell type pairs in the full 166 × 166 cell type comparison matrix. (b,c) Gene pair–reversal plots for cell types from the blood lineage used in ranking (top row) and for all cell type comparisons (bottom row). Pairs of TF genes that show statistically significant restricted reversal in the 166 × 166 cell type data are shown with their P values (hypergeometric test) for the erythroid-myeloid (b) and B-T lymphoid (c) splits. Color in the gene pair–reversal plots is as in Figure 1b and corresponds to the Δ value indicating the extent of reversal.

  5. Lineage relationships based on gene-pair expression reversals.
    Figure 5: Lineage relationships based on gene-pair expression reversals.

    An evaluation of utility of the similarity Φ to reflect lineage separation is shown. (a) Hierarchical clustering of 29 differentiated cell types based on similarity Φ. The circular dendrogram in the xy plane arranges cells to branching lineages. Ten precursor cell types placed to branch points according to the Hungarian algorithm (Online Methods) are indicated. The landscape elevation z represents the Φ similarity to ESCs. Blue color and high altitude on the landscape corresponds to large similarity to the pluripotent state. (b,c) To represent all 166 cell types, landscapes as in a are shown with multidimensional scaling for (b) TF genes or (c) metabolic genes41.


  1. Alberts, B. et al. Cells and genomes. in Molecular Biology of the Cell 3rd edn. Ch. 22 (Garland Science, New York, 1994).
  2. Zhou, J.X. & Huang, S. Understanding gene circuits at cell-fate branch points for rational cell reprogramming. Trends Genet. 27, 5562 (2011).
  3. Kauffman, S.A. Control circuits for determination and transdetermination. Science 181, 310318 (1973).
  4. Kauffman, S.A., Shymko, R.M. & Trabert, K. Control of sequential compartment formation in Drosophila. Science 199, 259270 (1978).
  5. Zhang, P. et al. Negative cross-talk between hematopoietic regulators: GATA proteins repress PU.1. Proc. Natl. Acad. Sci. USA 96, 87058710 (1999).
  6. Huang, S. et al. Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Dev. Biol. 305, 695713 (2007).
  7. Geman, D., d'Avignon, C., Naiman, D.Q. & Winslow, R.L. Classifying gene expression profiles from pairwise mRNA comparisons. Stat. Appl. Genet. Mol. Biol. 3, Article 19 (2004).
  8. Tan, A.C. et al. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 21, 38963904 (2005).
  9. Price, N.D. et al. Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas. Proc. Natl. Acad. Sci. USA 104, 34143419 (2007).
  10. Waddington, C.H. The Strategy of the Genes: A Discussion of Some Aspects of Theoretical Biology (Allen & Unwin, London, 1957).
  11. Yu, J. et al. Induced pluripotent stem cell lines derived from human somatic cells. Science 318, 19171920 (2007).
  12. The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799816 (2007).
  13. Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 11061117 (2008).
  14. Vierbuchen, T. et al. Direct conversion of fibroblasts to functional neurons by defined factors. Nature 463, 10351041 (2010).
  15. Grass, J.A. et al. GATA-1-dependent transcriptional repression of GATA-2 via disruption of positive autoregulation and domain-wide chromatin remodeling. Proc. Natl. Acad. Sci. USA 100, 88118816 (2003).
  16. Laslo, P. et al. Multilineage transcriptional priming and determination of alternate hematopoietic cell fates. Cell 126, 755766 (2006).
  17. Hu, M. et al. Multilineage gene expression precedes commitment in the hemopoietic system. Genes Dev. 11, 774785 (1997).
  18. Zhou, J.X., Brusch, L. & Huang, S. Predicting pancreas cell fate decisions and reprogramming with a hierarchical multi-attractor model. PLoS ONE 6, e14752 (2011).
  19. Hosoya, T. et al. GATA-3 is required for early T lineage progenitor development. J. Exp. Med. 206, 29873000 (2009).
  20. Miranda-Saavedra, D. & Göttgens, B. Transcriptional regulatory networks in haematopoiesis. Curr. Opin. Genet. Dev. 18, 530535 (2008).
  21. Swiers, G., Patient, R. & Loose, M. Genetic regulatory networks programming hematopoietic stem cells and erythroid lineage specification. Dev. Biol. 294, 525540 (2006).
  22. Feinberg, M.W. et al. The Kruppel-like factor KLF4 is a critical regulator of monocyte differentiation. EMBO J. 26, 41384148 (2007).
  23. Hoang, T. et al. Opposing effects of the basic helix-loop-helix transcription factor SCL on erythroid and monocytic differentiation. Blood 87, 102111 (1996).
  24. Ma, C. & Staudt, L.M. LAF-4 encodes a lymphoid nuclear protein with transactivation potential that is homologous to AF-4, the gene fused to MLL in t(4;11) leukemias. Blood 87, 734745 (1996).
  25. Nagasawa, M., Schmidlin, H., Hazekamp, M.G., Schotte, R. & Blom, B. Development of human plasmacytoid dendritic cells depends on the combined action of the basic helix-loop-helix factor E2-2 and the Ets factor Spi-B. Eur. J. Immunol. 38, 23892400 (2008).
  26. Hagman, J., Belanger, C., Travis, A., Turck, C.W. & Grosschedl, R. Cloning and functional characterization of early B-cell factor, a regulator of lymphocyte-specific gene expression. Genes Dev. 7, 760773 (1993).
  27. Zandi, S. et al. EBF1 is essential for B-lineage priming and establishment of a transcription factor network in common lymphoid progenitors. J. Immunol. 181, 33643372 (2008).
  28. Lukin, K. et al. A dose-dependent role for EBF1 in repressing non-B-cell-specific genes. Eur. J. Immunol. 41, 17871793 (2011).
  29. Dontje, W. et al. Delta-like1-induced Notch1 signaling regulates the human plasmacytoid dendritic cell versus T-cell lineage decision through control of GATA-3 and Spi-B. Blood 107, 24462452 (2006).
  30. Rosa, A. et al. The interplay between the master transcription factor PU.1 and miR-424 regulates human monocyte/macrophage differentiation. Proc. Natl. Acad. Sci. USA 104, 1984919854 (2007).
  31. Wei, G. et al. Genome-wide analyses of transcription factor GATA3-mediated gene regulation in distinct T cell types. Immunity 35, 299311 (2011).
  32. Treiber, T. et al. Early B cell factor 1 regulates B cell gene networks by activation, repression, and transcription- independent poising of chromatin. Immunity 32, 714725 (2010).
  33. Duarte, N.C. et al. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc. Natl. Acad. Sci. USA 104, 17771782 (2007).
  34. Pardo, M. et al. An expanded Oct4 interaction network: implications for stem cell biology, development, and disease. Cell Stem Cell 6, 382395 (2010).
  35. Kashyap, V. et al. Regulation of stem cell pluripotency and differentiation involves a mutual regulatory circuit of the NANOG, OCT4, and SOX2 pluripotency transcription factors with polycomb repressive complexes and stem cell microRNAs. Stem Cells Dev. 18, 10931108 (2009).
  36. Li, J.-Y. et al. Synergistic function of DNA methyltransferases Dnmt3a and Dnmt3b in the methylation of Oct4 and Nanog. Mol. Cell Biol. 27, 87488759 (2007).
  37. Sinkkonen, L. et al. MicroRNAs control de novo DNA methylation through regulation of transcriptional repressors in mouse embryonic stem cells. Nat. Struct. Mol. Biol. 15, 259267 (2008).
  38. Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 324, 930935 (2009).
  39. Ito, S. et al. Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass specification. Nature 466, 11291133 (2010).
  40. Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks. Cell 150, 12741286 (2012).
  41. Wu, Z. & Irizarry, R.A. Stochastic models inspired by hybridization theory for short oligonucleotide arrays. J. Comput. Biol. 12, 882893 (2005).
  42. Nishikawa, S.I. et al. Progressive lineage analysis by cell sorting and culture identifies FLK1+VE-cadherin+ cells at a diverging point of endothelial and hemopoietic lineages. Development 125, 17471757 (1998).
  43. Allen, C.D.C., Okada, T. & Cyster, J.G. Germinal-center organization and cellular dynamics. Immunity 27, 190202 (2007).
  44. Burkard, R., DellAmico, M. & Martello, S. Assignment Problems (SIAM, Philadelphia, 2009).
  45. McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495501 (2010).

Download references

Author information


  1. Life Sciences Research Unit, University of Luxembourg, Luxembourg, Luxembourg.

    • Merja Heinäniemi,
    • Anke Wienecke-Baldacchino &
    • Lasse Sinkkonen
  2. Luxembourg Centre for Systems Biomedicine, Esch-sur-Alzette, Luxembourg.

    • Merja Heinäniemi
  3. Department of Biotechnology and Molecular Medicine, A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland.

    • Merja Heinäniemi
  4. Institute of Biomedicine, School of Medicine, University of Eastern Finland, Kuopio, Finland.

    • Merja Heinäniemi
  5. Department of Signal Processing, Tampere University of Technology, Tampere, Finland.

    • Matti Nykter &
    • Stuart A Kauffman
  6. Institute of Biomedical Technology, University of Tampere, Tampere, Finland.

    • Matti Nykter
  7. Institute for Systems Biology, Seattle, Washington, USA.

    • Roger Kramer,
    • Joseph Xu Zhou,
    • Richard Kreisberg,
    • Stuart A Kauffman,
    • Sui Huang &
    • Ilya Shmulevich
  8. Institute for Biocomplexity and Informatics, University of Calgary, Calgary, Alberta, Canada.

    • Joseph Xu Zhou,
    • Stuart A Kauffman &
    • Sui Huang
  9. Complex Systems Center, University of Vermont, Burlington, Vermont, USA.

    • Stuart A Kauffman
  10. Present address: Department of Immunology, Laboratoire National de Santé, Centre de Recherche Public de la Santé, Luxembourg, Luxembourg.

    • Anke Wienecke-Baldacchino


M.H., M.N., R. Kramer and I.S. designed the gene-pair analysis, and M.H. and R. Kramer performed the analysis. M.H. and A.W.-B. designed the gene curation pipeline, and M.H., A.W.-B. and L.S. curated the genes. M.N., M.H., J.X.Z., S.A.K., S.H. and I.S. designed the clustering experiments and visualization of cell type dissimilarities. M.N. designed the branch-point placement algorithm. M.H. and M.N. compiled the ChIP-seq validations. M.H. and S.H. designed the reversal participation analysis. R. Kreisberg, M.H., M.N. and I.S. designed the content of the online resource. M.H., M.N., R. Kramer, S.H. and I.S. wrote the manuscript. All authors commented on the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (3 MB)

    Supplementary Figures 1–13, Supplementary Tables 4 and 6 and Supplementary Results

Excel files

  1. Supplementary Table 1 (53 KB)

    Cell type and tissue ontology terms

  2. Supplementary Table 2 (254 KB)

    Microarray samples mapped to ontology terms

  3. Supplementary Table 3 (33 KB)

    The order of cell types as it appears in the heat maps presented

  4. Supplementary Table 5 (315 KB)

    Functional evidence for a role in transcription regulation found in the gene-set curation

  5. Supplementary Table 7 (74 KB)

    Identification of candidate toggle pairs

  6. Supplementary Table 8 (729 KB)

    Rank-based differential expression analysis comparison using RCoS

  7. Supplementary Table 9 (254 KB)

    Rank-based differential expression analysis comparison using RDAM

  8. Supplementary Table 10 (25 KB)

    Public ChIP-seq data sets used

  9. Supplementary Table 11 (1 MB)

    Genomic region enrichment results for GATA1 ChIP-seq data

  10. Supplementary Table 12 (733 KB)

    Genomic region enrichment results for TAL1 ChIP-seq data

  11. Supplementary Table 13 (2 MB)

    Genomic region enrichment results for SPI1 ChIP-seq data

  12. Supplementary Table 14 (995 KB)

    Genomic region enrichment results for EBF1 ChIP-seq data

  13. Supplementary Table 15 (3 MB)

    Genomic region enrichment results for GATA3 ChIP-seq data

  14. Supplementary Table 16 (119 KB)

    Mouse knockout phenotypes of Gata1, Tal1, Sfpi1, Ebf1 and Gata3

  15. Supplementary Table 17 (102 KB)

    Additional microarray data used for validation.

Zip files

  1. Supplementary Software (5 MB)

    Online data resource and tool TREL. The online data resource and interactive tool ( encompassing pairwise comparisons of the genes and cell types presented in this article is available to explore transcriptome diversity in metazoa; this resource accompanied by a user guide and video tutorial.

Additional data