Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A census of human transcription factors: function, expression and evolution

Key Points

  • Transcription factors (TFs) are key cellular components that control gene expression. Their activity determines how cells function and respond to cellular environments.

  • Despite intense scientific interest in transcriptional regulation, there is currently no reliable repertoire of TFs encoded in the human genome; and research in this field will become severely limited without such information.

  • We present an analysis of 1,391 sequence-specific DNA-binding TFs for the human genome, and their functions, genomic organization and evolutionary conservation. Manual curation of each regulator ensures that the data set is of high quality.

  • Less than 30% of TFs have a known regulatory function. Therefore, by integrating genome-scale data sets we assess several properties that provide initial insights into their functions.

  • We analysed expression profiles across numerous healthy organs and tissues, enabling us to define sets of global and tissue-specific regulators that act as combinatorial partners in a two-tier regulatory system.

  • We compared the genomes of 24 eukaryotic organisms, revealing how distinct types of regulators appeared at crucial points during the evolution of the human lineage.

  • One-fifth of TF-encoding genes reside in high-density clusters, which arose from a series of recent recombination events. This clustering might allow their expression to be coordinately controlled.

  • Although this is a first analysis and much remains to be explored, these observations offer useful starting points for further investigations of the regulatory mechanisms underlying biological processes in humans. Eventually, it may be possible to predict the outcomes resulting from the activity of different TFs.

Abstract

Transcription factors are key cellular components that control gene expression: their activities determine how cells function and respond to the environment. Currently, there is great interest in research into human transcriptional regulation. However, surprisingly little is known about these regulators themselves. For example, how many transcription factors does the human genome contain? How are they expressed in different tissues? Are they evolutionarily conserved? Here, we present an analysis of 1,391 manually curated sequence-specific DNA-binding transcription factors, their functions, genomic organization and evolutionary conservation. Much remains to be explored, but this study provides a solid foundation for future investigations to elucidate regulatory mechanisms underlying diverse mammalian biological processes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Current state of knowledge about transcription factors in the human genome.
Figure 2: Transcription factors classified by DNA-binding domain.
Figure 3: Expression level of transcription factors in 32 human organs and tissues.
Figure 4: Heat map representation of transcription factor expression in 32 human organs and tissues.
Figure 5: Conservation of human transcription factors across 24 eukaryotic genomes.
Figure 6: Locations of transcription factor clusters in the human genome.

References

  1. 1

    Simon, I. et al. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106, 697–708 (2001).

    CAS  Google Scholar 

  2. 2

    Accili, D. & Arden, K. C. FoxOs at the crossroads of cellular metabolism, differentiation, and transformation. Cell 117, 421–426 (2004).

    CAS  PubMed  Google Scholar 

  3. 3

    Bain, G. et al. E2A proteins are required for proper B cell development and initiation of immunoglobulin gene rearrangements. Cell 79, 885–892 (1994).

    CAS  PubMed  Google Scholar 

  4. 4

    Dynlacht, B. D. Regulation of transcription by proteins that control the cell cycle. Nature 389, 149–152 (1997).

    CAS  PubMed  Google Scholar 

  5. 5

    Furney, S. J. et al. Structural and functional properties of genes involved in human cancer. BMC Genomics 7, 3 (2006).

    PubMed  PubMed Central  Google Scholar 

  6. 6

    Boyadjiev, S. A. & Jabs, E. W. Online Mendelian Inheritance in Man (OMIM) as a knowledgebase for human developmental disorders. Clin. Genet. 57, 253–266 (2000).

    CAS  PubMed  Google Scholar 

  7. 7

    Bustamante, C. D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005). Genome-wide study demonstrating that human TFs are under strong positive selection.

    CAS  PubMed  Google Scholar 

  8. 8

    De, S., Lopez-Bigas, N. & Teichmann, S. A. Patterns of evolutionary constraints on genes in humans. BMC Evol. Biol. 8, 275 (2008).

    PubMed  PubMed Central  Google Scholar 

  9. 9

    Lopez-Bigas, N., De, S. & Teichmann, S. A. Functional protein divergence in the evolution of Homo sapiens. Genome Biol. 9, R33 (2008).

    PubMed  PubMed Central  Google Scholar 

  10. 10

    van Nimwegen, E. Scaling laws in the functional content of genomes. Trends Genet. 19, 479–484 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Vogel, C. & Chothia, C. Protein family expansions and biological complexity. PLoS Comput. Biol. 2, e48 (2006).

    PubMed  PubMed Central  Google Scholar 

  12. 12

    Kirschner, M. & Gerhart, J. Evolvability. Proc. Natl Acad. Sci. USA 95, 8420–8427 (1998).

    CAS  Google Scholar 

  13. 13

    Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147–151 (2003). A discussion of how progressively more elaborate transcriptional regulation has contributed to organismal complexity.

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Lemon, B. & Tjian, R. Orchestrated response: a symphony of transcription factors for gene control. Genes Dev. 14, 2551–2569 (2000).

    CAS  PubMed  Google Scholar 

  15. 15

    Wilson, D. et al. DBD — taxonomically broad transcription factor predictions: new content and functionality. Nucleic Acids Res. 36, D88–D92 (2008).

    CAS  PubMed  Google Scholar 

  16. 16

    Perez-Rueda, E. & Collado-Vides, J. The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. Nucleic Acids Res. 28, 1838–1847 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17

    Moreno-Campuzano, S., Janga, S. C. & Perez-Rueda, E. Identification and analysis of DNA-binding transcription factors in Bacillus subtilis and other Firmicutes — a genomic approach. BMC Genomics 7, 147 (2006).

    PubMed  PubMed Central  Google Scholar 

  18. 18

    Park, J. et al. FTFD: an informatics pipeline supporting phylogenomic analysis of fungal transcription factors. Bioinformatics 24, 1024–1025 (2008).

    CAS  PubMed  Google Scholar 

  19. 19

    Cherry, J. M. et al. SGD: Saccharomyces Genome Database. Nucleic Acids Res. 26, 73–79 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20

    Reece-Hoyes, J. S. et al. A compendium of Caenorhabditis elegans regulatory transcription factors: a resource for mapping transcription regulatory networks. Genome Biol. 6, R110 (2005).

    PubMed  PubMed Central  Google Scholar 

  21. 21

    Adryan, B. & Teichmann, S. A. FlyTF: a systematic review of site-specific transcription factors in the fruit fly Drosophila melanogaster. Bioinformatics 22, 1532–1533 (2006).

    CAS  PubMed  Google Scholar 

  22. 22

    Gray, P. A. et al. Mouse brain organization revealed through direct genome-scale TF expression analysis. Science 306, 2255–2257 (2004).

    CAS  PubMed  Google Scholar 

  23. 23

    Riano-Pachon, D. M. et al. PlnTFDB: an integrative plant transcription factor database. BMC Bioinformatics 8, 42 (2007).

    PubMed  PubMed Central  Google Scholar 

  24. 24

    Riechmann, J. L. et al. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290, 2105–2110 (2000). The first genome-wide survey of TF repertoires for eukaryotic organisms.

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25

    Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    CAS  PubMed  Google Scholar 

  26. 26

    Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    CAS  Google Scholar 

  27. 27

    Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 35, D224–D228 (2008).

    Google Scholar 

  29. 29

    Messina, D. N. et al. An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. Genome Res. 14, 2041–2047 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30

    Roach, J. C. et al. Transcription factor expression in lipopolysaccharide-activated peripheral-blood-derived mononuclear cells. Proc. Natl Acad. Sci. USA 104, 16245–16250 (2007).

    CAS  PubMed  Google Scholar 

  31. 31

    Kersey, P. J. et al. The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–1988 (2004).

    CAS  PubMed  Google Scholar 

  32. 32

    Hubbard, T. J. et al. Ensembl Nucleic Acids Res. 35, D610–D617 (2007).

    CAS  PubMed  Google Scholar 

  33. 33

    Wingender, E. et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34

    Brunkow, M. E. et al. Disruption of a new forkhead/winged-helix protein, scurfin, results in the fatal lymphoproliferative disorder of the scurfy mouse. Nature Genet. 27, 68–73 (2001).

    CAS  Google Scholar 

  35. 35

    Wildin, R. S. et al. X-linked neonatal diabetes mellitus, enteropathy and endocrinopathy syndrome is the human equivalent of mouse scurfy. Nature Genet. 27, 18–20 (2001).

    CAS  Google Scholar 

  36. 36

    Hori, S., Nomura, T. & Sakaguchi, S. Control of regulatory T cell development by the transcription factor Foxp3. Science 299, 1057–1061 (2003).

    CAS  Google Scholar 

  37. 37

    Fontenot, J. D., Gavin, M. A. & Rudensky, A. Y. Foxp3 programs the development and function of CD4+ CD25+ regulatory T cells. Nature Immunol. 4, 330–336 (2003).

    CAS  Google Scholar 

  38. 38

    Marson, A. et al. Foxp3 occupancy and regulation of key target genes during T-cell stimulation. Nature 445, 931–935 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39

    Zheng, Y. et al. Genome-wide analysis of Foxp3 target genes in developing and mature regulatory T cells. Nature 445, 936–940 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Satoda, N. et al. Value of FOXP3 expression in peripheral blood as rejection marker after miniature swine lung transplantation. J. Heart Lung Transplant. 27, 1293–1301 (2008).

    PubMed  Google Scholar 

  41. 41

    Luscombe, N. M. et al. An overview of the structures of protein–DNA complexes. Genome Biol. 1, REVIEWS001 (2000). A review of DNA-binding domain structures and their interactions with DNA.

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42

    Su, A. I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA 101, 6062–6067 (2004). This paper presents the microarray experiments underlying the SymAtlas gene expression data for human and mouse organs, tissues and cell lines.

    CAS  PubMed  Google Scholar 

  43. 43

    Ghaemmaghami, S. et al. Global analysis of protein expression in yeast. Nature 425, 737–741 (2003).

    CAS  Google Scholar 

  44. 44

    Liu, X. & Clarke, N. D. Rationalization of gene regulation by a eukaryotic transcription factor: calculation of regulatory region occupancy from predicted binding affinities. J. Mol. Biol. 323, 1–8 (2002).

    CAS  PubMed  Google Scholar 

  45. 45

    Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46

    Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009). A recent overview of the use of ultra-high-throughput sequencing technologies for measuring transcript levels.

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Merscher, S. et al. TBX1 is responsible for cardiovascular defects in velo-cardio-facial/DiGeorge syndrome. Cell 104, 619–629 (2001).

    CAS  Google Scholar 

  48. 48

    Tang, C. J. et al. The zinc finger domain of Tzfp binds to the tbs motif located at the upstream flanking region of the Aie1 (aurora-C) kinase gene. J. Biol. Chem. 276, 19631–19639 (2001).

    CAS  PubMed  Google Scholar 

  49. 49

    Pashmforoush, M. et al. Nkx2–5 pathways and congenital heart disease; loss of ventricular myocyte lineage specification leads to progressive cardiomyopathy and complete heart block. Cell 117, 373–386 (2004).

    CAS  PubMed  Google Scholar 

  50. 50

    Koizume, S. et al. Heterogeneity in binding and gene-expression regulation by HIF-2α. Biochem. Biophys. Res. Commun. 371, 251–255 (2008).

    CAS  PubMed  Google Scholar 

  51. 51

    Kimura, S. et al. The T/ebp null mouse: thyroid-specific enhancer-binding protein is essential for the organogenesis of the thyroid, lung, ventral forebrain, and pituitary. Genes Dev. 10, 60–69 (1996).

    CAS  PubMed  Google Scholar 

  52. 52

    Morte, B. et al. Deletion of the thyroid hormone receptor α1 prevents the structural alterations of the cerebellum induced by hypothyroidism. Proc. Natl Acad. Sci. USA 99, 3985–3989 (2002).

    CAS  PubMed  Google Scholar 

  53. 53

    Hirose, K. et al. cDNA cloning and tissue-specific expression of a novel basic helix-loop-helix/PAS factor (Arnt2) with close sequence similarity to the aryl hydrocarbon receptor nuclear translocator (Arnt). Mol. Cell Biol. 16, 1706–1713 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Alberti, S. et al. Neuronal migration in the murine rostral migratory stream requires serum response factor. Proc. Natl Acad. Sci. USA 102, 6148–6153 (2005).

    CAS  PubMed  Google Scholar 

  55. 55

    Miano, J. M. et al. Restricted inactivation of serum response factor to the cardiovascular system. Proc. Natl Acad. Sci. USA 101, 17132–17137 (2004).

    CAS  PubMed  Google Scholar 

  56. 56

    Gauthier-Rouviere, C. et al. p67SRF is a constitutive nuclear protein implicated in the modulation of genes required throughout the G1 period. Cell Regul. 2, 575–588 (1991).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57

    Arsenian, S. et al. Serum response factor is essential for mesoderm formation during mouse embryogenesis. EMBO J. 17, 6289–6299 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58

    Hill, C. S., Wynne, J. & Treisman, R. The Rho family GTPases RhoA, Rac1, and CDC42Hs regulate transcriptional activation by SRF. Cell 81, 1159–1170 (1995).

    CAS  PubMed  Google Scholar 

  59. 59

    Treisman, R. Ternary complex factors: growth factor regulated transcriptional activators. Curr. Opin. Genet. Dev. 4, 96–101 (1994).

    CAS  PubMed  Google Scholar 

  60. 60

    Mo, Y. et al. Crystal structure of a ternary SAP-1/SRF/c-fos SRE DNA complex. J. Mol. Biol. 314, 495–506 (2001).

    CAS  PubMed  Google Scholar 

  61. 61

    Cooper, S. J. et al. Serum response factor binding sites differ in three human cell types. Genome Res. 17, 136–144 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. 62

    Gineitis, D. & Treisman, R. Differential usage of signal transduction pathways defines two types of serum response factor target gene. J. Biol. Chem. 276, 24531–24539 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63

    Gonzalez Bosc, L. V. et al. Nuclear factor of activated T cells and serum response factor cooperatively regulate the activity of an α-actin intronic enhancer. J. Biol. Chem. 280, 26113–26120 (2005).

    PubMed  Google Scholar 

  64. 64

    Doi, H. et al. HERP1 inhibits myocardin-induced vascular smooth muscle cell differentiation by interfering with SRF binding to CArG box. Arterioscler. Thromb. Vasc. Biol. 25, 2328–2334 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. 65

    Zhang, Y., Fillmore, R. A. & Zimmer, W. E. Structural and functional analysis of domains mediating interaction between the bagpipe homologue, Nkx3.1 and serum response factor. Exp. Biol. Med. (Maywood) 233, 297–309 (2008).

    CAS  Google Scholar 

  66. 66

    Amoutzias, G. D. et al. One billion years of bZIP transcription factor evolution: conservation and change in dimerization and DNA-binding site specificity. Mol. Biol. Evol. 24, 827–835 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67

    Uht, R. M. et al. A conserved lysine in the estrogen receptor DNA binding domain regulates ligand activation profiles at AP-1 sites, possibly by controlling interactions with a modulating repressor. Nucl. Recept. 2, 2 (2004).

    PubMed  PubMed Central  Google Scholar 

  68. 68

    Garcia-Fernandez, J. The genesis and evolution of homeobox gene clusters. Nature Rev. Genet. 6, 881–892 (2005).

    CAS  Google Scholar 

  69. 69

    De la Houssaye, G. et al. ETS-1 and ETS-2 are upregulated in a transgenic mouse model of pigmented ocular neoplasm. Mol. Vis. 14, 1912–1928 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70

    Albagli, O. et al. A model for gene evolution of the ets-1/ets-2 transcription factors based on structural and functional homologies. Oncogene 9, 3259–3271 (1994).

    CAS  PubMed  Google Scholar 

  71. 71

    Itzkovitz, S., Tlusty, T. & Alon, U. Coding limits on the number of transcription factors. BMC Genomics 7, 239 (2006).

    PubMed  PubMed Central  Google Scholar 

  72. 72

    Luscombe, N. M. & Thornton, J. M. Protein–DNA interactions: amino acid conservation and the effects of mutations on binding specificity. J. Mol. Biol. 320, 991–1009 (2002).

    CAS  PubMed  Google Scholar 

  73. 73

    Pavletich, N. P. & Pabo, C. O. Zinc finger–DNA recognition: crystal structure of a Zif268–DNA complex at 2.1 A. Science 252, 809–817 (1991).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. 74

    Nardelli, J. et al. Base sequence discrimination by zinc-finger DNA-binding domains. Nature 349, 175–178 (1991).

    CAS  PubMed  Google Scholar 

  75. 75

    Honda, K. & Taniguchi, T. IRFs: master regulators of signalling by Toll-like receptors and cytosolic pattern-recognition receptors. Nature Rev. Immunol. 6, 644–658 (2006).

    CAS  Google Scholar 

  76. 76

    Bulger, M. et al. Conservation of sequence and structure flanking the mouse and human β-globin loci: the β-globin genes are embedded within an array of odorant receptor genes. Proc. Natl Acad. Sci. USA 96, 5129–5134 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77

    Ben-Arie, N. et al. Olfactory receptor gene cluster on human chromosome 17: possible duplication of an ancestral receptor repertoire. Hum. Mol. Genet. 3, 229–235 (1994).

    CAS  PubMed  Google Scholar 

  78. 78

    Scott, M. P. Vertebrate homeobox gene nomenclature. Cell 71, 551–553 (1992).

    CAS  PubMed  Google Scholar 

  79. 79

    Dehal, P. et al. Human chromosome 19 and related regions in mouse: conservative and lineage-specific evolution. Science 293, 104–111 (2001).

    CAS  PubMed  Google Scholar 

  80. 80

    Grimwood, J. et al. The DNA sequence and biology of human chromosome 19. Nature 428, 529–535 (2004).

    CAS  PubMed  Google Scholar 

  81. 81

    Abbasi, A. A. & Grzeschik, K. H. An insight into the phylogenetic history of HOX linked gene families in vertebrates. BMC Evol. Biol. 7, 239 (2007).

    PubMed  PubMed Central  Google Scholar 

  82. 82

    Looman, C. et al. KRAB zinc finger proteins: an analysis of the molecular mechanisms governing their increase in numbers and complexity during evolution. Mol. Biol. Evol. 19, 2118–2130 (2002).

    CAS  PubMed  Google Scholar 

  83. 83

    Huntley, S. et al. A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Res. 16, 669–677 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. 84

    Vogel, M. J. et al. Human heterochromatin proteins form large domains containing KRAB-ZNF genes. Genome Res. 16, 1493–1504 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. 85

    Kikuta, H. et al. Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res. 17, 545–555 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. 86

    Lee, A. P. et al. Highly conserved syntenic blocks at the vertebrate Hox loci and conserved regulatory elements within and outside Hox gene clusters. Proc. Natl Acad. Sci. USA 103, 6994–6999 (2006).

    CAS  PubMed  Google Scholar 

  87. 87

    Kim, S. K. et al. A gene expression map for Caenorhabditis elegans. Science 293, 2087–2092 (2001).

    CAS  PubMed  Google Scholar 

  88. 88

    Arendt, D. The evolution of cell types in animals: emerging principles from molecular studies. Nature Rev. Genet. 9, 868–882 (2008).

    CAS  Google Scholar 

  89. 89

    Wilson, C. A., Kreychman, J. & Gerstein, M. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J. Mol. Biol. 297, 233–249 (2000).

    CAS  PubMed  Google Scholar 

  90. 90

    Todd, A. E., Orengo, C. A. & Thornton, J. M. Evolution of function in protein superfamilies, from a structural perspective. J. Mol. Biol. 307, 1113–1143 (2001).

    CAS  Google Scholar 

  91. 91

    Freilich, S. et al. Relationship between the tissue-specificity of mouse gene expression and the evolutionary origin and function of the proteins. Genome Biol. 6, R56 (2005).

    PubMed  PubMed Central  Google Scholar 

  92. 92

    Hirayama, T. & Shinozaki, K. A cdc5+ homolog of a higher plant, Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 93, 13371–13376 (1996).

    CAS  PubMed  Google Scholar 

  93. 93

    Monod, J. & Jacob, F. Teleonomic mechanisms in cellular metabolism, growth, and differentiation. Cold Spring Harb. Symp. Quant. Biol. 26, 389–401 (1961).

    CAS  PubMed  PubMed Central  Google Scholar 

  94. 94

    King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).

    CAS  PubMed  PubMed Central  Google Scholar 

  95. 95

    Nielsen, R. et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 3, e170 (2005).

    PubMed  PubMed Central  Google Scholar 

  96. 96

    Lai, C. S. et al. A forkhead-domain gene is mutated in a severe speech and language disorder. Nature 413, 519–523 (2001).

    CAS  Google Scholar 

  97. 97

    Enard, W. et al. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418, 869–872 (2002).

    CAS  Google Scholar 

  98. 98

    Haygood, R. et al. Promoter regions of many neural- and nutrition-related genes have experienced positive selection during human evolution. Nature Genet. 39, 1140–1144 (2007).

    CAS  PubMed  Google Scholar 

  99. 99

    Odom, D. T. et al. Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nature Genet. 39, 730–732 (2007). A ChIP–chip study showing that functionally equivalent TFs bind to different sites in the human and mouse genomes.

    CAS  Google Scholar 

  100. 100

    Khaitovich, P. et al. Parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees. Science 309, 1850–1854 (2005).

    CAS  Google Scholar 

  101. 101

    Gilad, Y. et al. Expression profiling in primates reveals a rapid evolution of human transcription factors. Nature 440, 242–245 (2006). A microarray study suggesting that changes in gene expression levels might be important in distinguishing between different primate species.

    CAS  PubMed  PubMed Central  Google Scholar 

  102. 102

    Stroud, J. C. et al. Structure of the forkhead domain of FOXP2 bound to DNA. Structure 14, 159–166 (2006).

    CAS  Google Scholar 

  103. 103

    Isalan, M. et al. Evolvability and hierarchy in rewired bacterial gene networks. Nature 452, 840–845 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  104. 104

    Lopez-Bigas, N., Blencowe, B. J. & Ouzounis, C. A. Highly consistent patterns for inherited human diseases at the molecular level. Bioinformatics 22, 269–277 (2006).

    CAS  PubMed  Google Scholar 

  105. 105

    Darnell, J. E. Transcription factors as targets for cancer therapy. Nature Rev. Cancer 2, 740–749 (2002).

    CAS  Google Scholar 

  106. 106

    Engelkamp, D. & van Heyningen, V. Transcription factors in disease. Curr. Opin. Genet. Dev. 6, 334–342 (1996).

    CAS  PubMed  Google Scholar 

  107. 107

    Jimenez-Sanchez, G., Childs, B. & Valle, D. Human disease genes. Nature 409, 853–855 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  108. 108

    Wheeler, D. L. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 35, D5–D12 (2007).

    CAS  PubMed  Google Scholar 

  109. 109

    Scherzer, C. R. et al. GATA transcription factors directly regulate the Parkinson's disease-linked gene α-synuclein. Proc. Natl Acad. Sci. USA 105, 10907–10912 (2008).

    CAS  PubMed  Google Scholar 

  110. 110

    Sinha, S. et al. Systematic functional characterization of cis-regulatory motifs in human core promoters. Genome Res. 18, 477–488 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  111. 111

    Martinez, N. J. et al. A C. elegans genome-scale microRNA network contains composite feedback motifs with high flux capacity. Genes Dev. 22, 2535–2549 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  112. 112

    Berger, M. F. et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 133, 1266–1276 (2008). A high-throughput assay of DNA-binding specificities for 168 mouse TFs using protein-binding microarrays.

    CAS  PubMed  PubMed Central  Google Scholar 

  113. 113

    Hallikas, O. et al. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006). A study combining SELEX and motif-finding methods to identify the DNA-binding specificities and target regions for five mammalian TFs.

    CAS  PubMed  Google Scholar 

  114. 114

    Horak, C. E. et al. GATA-1 binding sites mapped in the β-globin locus by using mammalian chIp–chip analysis. Proc. Natl Acad. Sci. USA 99, 2924–2929 (2002).

    CAS  PubMed  Google Scholar 

  115. 115

    Johnson, D. S. et al. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).

    CAS  Google Scholar 

  116. 116

    Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods 4, 651–657 (2007). Refs 115 and 116 are the first uses of chromatin immunoprecipitation and ultra-high-throughput sequencing to determine genome-wide binding sites of mammalian TFs

    CAS  PubMed  PubMed Central  Google Scholar 

  117. 117

    Gaspard, N. et al. An intrinsic mechanism of corticogenesis from embryonic stem cells. Nature 455, 351–357 (2008).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank J. Herrero and A. Vilella for assistance using Ensembl Compara. J.M.V. thanks the Spanish Ministry of Education and Science, the European Science Foundation Exchange Grant and the Marie Curie Biostar programmes for funding. N.M.L. acknowledges funding from EMBL.

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Juan M. Vaquerizas or Nicholas M. Luscombe.

Supplementary information

Related links

Related links

FURTHER INFORMATION

Human TF repertoire

Luscombe laboratory

Teichmann laboratory

DBD

EdgeDB

Ensembl Compara database

Ensembl Genome Browser

FlyTF

FTFD

Gene Ontology Home

InterPro

IPI

JASPAR

OMIM

Pfam

PlnTFDB

PubMed

RegulonDB

SGD

SUPERFAMILY

SymAtlas

TRANSFAC

Glossary

General transcription factor

One of a group of proteins that are essential for transcription from a eukaryotic promoter. They are involved in the formation of the pre-initiation complex and the recruitment of RNA polymerase.

Co-factor

A protein or small molecule that modulates the activity of an enzyme or of another protein complex.

Histone

A small highly conserved basic protein, found in the chromatin of all eukaryotic cells. Histones associate with DNA to form nucleosomes.

Chromatin remodelling protein

A protein that mediates transient changes in chromatin accessibility by modifying the methylation or acetylation status of histones or the methylation status of cytosine residues in DNA.

Gene Ontology

(GO). A widely used classification system of gene functions and other gene attributes that uses a controlled vocabulary.

InterPro

A database of conserved protein families, domains and motifs that can be used to annotate amino acid sequences. The presence of a protein domain is often indicative of a particular molecular function.

SELEX

A procedure to identify protein ligands. For DNA-binding proteins, the protein is mixed with a pool of double-stranded oligonucleotides that contain a random core of nucleotides flanked by specific sequences. The protein–DNA complex is recovered, the oligonucleotides amplified by PCR and sequenced to reveal the binding specificity of the protein.

Orthologues

Loci in two species that are derived from a common ancestral locus by a speciation event. This is different from paralogous members of a gene family that are derived from duplication events.

RNA-Seq

The use of high-throughput sequencing techniques for transcriptomic profiling.

Propensity values

A measure of tissue specificity that normalizes the expression value of a TF across all samples, and the expression of all TFs in a single sample. It is commonly used to measure the distribution of amino acids types in different features of protein structures.

Fisher's exact test

A statistical test of independence between two categorical variables.

Protein-binding array

A high-throughput technique to determine DNA-binding affinities of proteins using microarrays displaying synthetic oligonucleotides.

ChIP-Seq

The combination of chromatin immunoprecipitation (ChIP) experiments with high-throughput sequencing techniques to quantitate protein targeting or chromatin modifications across the entire genome.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Vaquerizas, J., Kummerfeld, S., Teichmann, S. et al. A census of human transcription factors: function, expression and evolution. Nat Rev Genet 10, 252–263 (2009). https://doi.org/10.1038/nrg2538

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing