Transcription factors (TFs) are key cellular components that control gene expression. Their activity determines how cells function and respond to cellular environments.
Despite intense scientific interest in transcriptional regulation, there is currently no reliable repertoire of TFs encoded in the human genome; and research in this field will become severely limited without such information.
We present an analysis of 1,391 sequence-specific DNA-binding TFs for the human genome, and their functions, genomic organization and evolutionary conservation. Manual curation of each regulator ensures that the data set is of high quality.
Less than 30% of TFs have a known regulatory function. Therefore, by integrating genome-scale data sets we assess several properties that provide initial insights into their functions.
We analysed expression profiles across numerous healthy organs and tissues, enabling us to define sets of global and tissue-specific regulators that act as combinatorial partners in a two-tier regulatory system.
We compared the genomes of 24 eukaryotic organisms, revealing how distinct types of regulators appeared at crucial points during the evolution of the human lineage.
One-fifth of TF-encoding genes reside in high-density clusters, which arose from a series of recent recombination events. This clustering might allow their expression to be coordinately controlled.
Although this is a first analysis and much remains to be explored, these observations offer useful starting points for further investigations of the regulatory mechanisms underlying biological processes in humans. Eventually, it may be possible to predict the outcomes resulting from the activity of different TFs.
Transcription factors are key cellular components that control gene expression: their activities determine how cells function and respond to the environment. Currently, there is great interest in research into human transcriptional regulation. However, surprisingly little is known about these regulators themselves. For example, how many transcription factors does the human genome contain? How are they expressed in different tissues? Are they evolutionarily conserved? Here, we present an analysis of 1,391 manually curated sequence-specific DNA-binding transcription factors, their functions, genomic organization and evolutionary conservation. Much remains to be explored, but this study provides a solid foundation for future investigations to elucidate regulatory mechanisms underlying diverse mammalian biological processes.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Simon, I. et al. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106, 697–708 (2001).
Accili, D. & Arden, K. C. FoxOs at the crossroads of cellular metabolism, differentiation, and transformation. Cell 117, 421–426 (2004).
Bain, G. et al. E2A proteins are required for proper B cell development and initiation of immunoglobulin gene rearrangements. Cell 79, 885–892 (1994).
Dynlacht, B. D. Regulation of transcription by proteins that control the cell cycle. Nature 389, 149–152 (1997).
Furney, S. J. et al. Structural and functional properties of genes involved in human cancer. BMC Genomics 7, 3 (2006).
Boyadjiev, S. A. & Jabs, E. W. Online Mendelian Inheritance in Man (OMIM) as a knowledgebase for human developmental disorders. Clin. Genet. 57, 253–266 (2000).
Bustamante, C. D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005). Genome-wide study demonstrating that human TFs are under strong positive selection.
De, S., Lopez-Bigas, N. & Teichmann, S. A. Patterns of evolutionary constraints on genes in humans. BMC Evol. Biol. 8, 275 (2008).
Lopez-Bigas, N., De, S. & Teichmann, S. A. Functional protein divergence in the evolution of Homo sapiens. Genome Biol. 9, R33 (2008).
van Nimwegen, E. Scaling laws in the functional content of genomes. Trends Genet. 19, 479–484 (2003).
Vogel, C. & Chothia, C. Protein family expansions and biological complexity. PLoS Comput. Biol. 2, e48 (2006).
Kirschner, M. & Gerhart, J. Evolvability. Proc. Natl Acad. Sci. USA 95, 8420–8427 (1998).
Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147–151 (2003). A discussion of how progressively more elaborate transcriptional regulation has contributed to organismal complexity.
Lemon, B. & Tjian, R. Orchestrated response: a symphony of transcription factors for gene control. Genes Dev. 14, 2551–2569 (2000).
Wilson, D. et al. DBD — taxonomically broad transcription factor predictions: new content and functionality. Nucleic Acids Res. 36, D88–D92 (2008).
Perez-Rueda, E. & Collado-Vides, J. The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. Nucleic Acids Res. 28, 1838–1847 (2000).
Moreno-Campuzano, S., Janga, S. C. & Perez-Rueda, E. Identification and analysis of DNA-binding transcription factors in Bacillus subtilis and other Firmicutes — a genomic approach. BMC Genomics 7, 147 (2006).
Park, J. et al. FTFD: an informatics pipeline supporting phylogenomic analysis of fungal transcription factors. Bioinformatics 24, 1024–1025 (2008).
Cherry, J. M. et al. SGD: Saccharomyces Genome Database. Nucleic Acids Res. 26, 73–79 (1998).
Reece-Hoyes, J. S. et al. A compendium of Caenorhabditis elegans regulatory transcription factors: a resource for mapping transcription regulatory networks. Genome Biol. 6, R110 (2005).
Adryan, B. & Teichmann, S. A. FlyTF: a systematic review of site-specific transcription factors in the fruit fly Drosophila melanogaster. Bioinformatics 22, 1532–1533 (2006).
Gray, P. A. et al. Mouse brain organization revealed through direct genome-scale TF expression analysis. Science 306, 2255–2257 (2004).
Riano-Pachon, D. M. et al. PlnTFDB: an integrative plant transcription factor database. BMC Bioinformatics 8, 42 (2007).
Riechmann, J. L. et al. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290, 2105–2110 (2000). The first genome-wide survey of TF repertoires for eukaryotic organisms.
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000).
Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 35, D224–D228 (2008).
Messina, D. N. et al. An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. Genome Res. 14, 2041–2047 (2004).
Roach, J. C. et al. Transcription factor expression in lipopolysaccharide-activated peripheral-blood-derived mononuclear cells. Proc. Natl Acad. Sci. USA 104, 16245–16250 (2007).
Kersey, P. J. et al. The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–1988 (2004).
Hubbard, T. J. et al. Ensembl Nucleic Acids Res. 35, D610–D617 (2007).
Wingender, E. et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319 (2000).
Brunkow, M. E. et al. Disruption of a new forkhead/winged-helix protein, scurfin, results in the fatal lymphoproliferative disorder of the scurfy mouse. Nature Genet. 27, 68–73 (2001).
Wildin, R. S. et al. X-linked neonatal diabetes mellitus, enteropathy and endocrinopathy syndrome is the human equivalent of mouse scurfy. Nature Genet. 27, 18–20 (2001).
Hori, S., Nomura, T. & Sakaguchi, S. Control of regulatory T cell development by the transcription factor Foxp3. Science 299, 1057–1061 (2003).
Fontenot, J. D., Gavin, M. A. & Rudensky, A. Y. Foxp3 programs the development and function of CD4+ CD25+ regulatory T cells. Nature Immunol. 4, 330–336 (2003).
Marson, A. et al. Foxp3 occupancy and regulation of key target genes during T-cell stimulation. Nature 445, 931–935 (2007).
Zheng, Y. et al. Genome-wide analysis of Foxp3 target genes in developing and mature regulatory T cells. Nature 445, 936–940 (2007).
Satoda, N. et al. Value of FOXP3 expression in peripheral blood as rejection marker after miniature swine lung transplantation. J. Heart Lung Transplant. 27, 1293–1301 (2008).
Luscombe, N. M. et al. An overview of the structures of protein–DNA complexes. Genome Biol. 1, REVIEWS001 (2000). A review of DNA-binding domain structures and their interactions with DNA.
Su, A. I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA 101, 6062–6067 (2004). This paper presents the microarray experiments underlying the SymAtlas gene expression data for human and mouse organs, tissues and cell lines.
Ghaemmaghami, S. et al. Global analysis of protein expression in yeast. Nature 425, 737–741 (2003).
Liu, X. & Clarke, N. D. Rationalization of gene regulation by a eukaryotic transcription factor: calculation of regulatory region occupancy from predicted binding affinities. J. Mol. Biol. 323, 1–8 (2002).
Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).
Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009). A recent overview of the use of ultra-high-throughput sequencing technologies for measuring transcript levels.
Merscher, S. et al. TBX1 is responsible for cardiovascular defects in velo-cardio-facial/DiGeorge syndrome. Cell 104, 619–629 (2001).
Tang, C. J. et al. The zinc finger domain of Tzfp binds to the tbs motif located at the upstream flanking region of the Aie1 (aurora-C) kinase gene. J. Biol. Chem. 276, 19631–19639 (2001).
Pashmforoush, M. et al. Nkx2–5 pathways and congenital heart disease; loss of ventricular myocyte lineage specification leads to progressive cardiomyopathy and complete heart block. Cell 117, 373–386 (2004).
Koizume, S. et al. Heterogeneity in binding and gene-expression regulation by HIF-2α. Biochem. Biophys. Res. Commun. 371, 251–255 (2008).
Kimura, S. et al. The T/ebp null mouse: thyroid-specific enhancer-binding protein is essential for the organogenesis of the thyroid, lung, ventral forebrain, and pituitary. Genes Dev. 10, 60–69 (1996).
Morte, B. et al. Deletion of the thyroid hormone receptor α1 prevents the structural alterations of the cerebellum induced by hypothyroidism. Proc. Natl Acad. Sci. USA 99, 3985–3989 (2002).
Hirose, K. et al. cDNA cloning and tissue-specific expression of a novel basic helix-loop-helix/PAS factor (Arnt2) with close sequence similarity to the aryl hydrocarbon receptor nuclear translocator (Arnt). Mol. Cell Biol. 16, 1706–1713 (1996).
Alberti, S. et al. Neuronal migration in the murine rostral migratory stream requires serum response factor. Proc. Natl Acad. Sci. USA 102, 6148–6153 (2005).
Miano, J. M. et al. Restricted inactivation of serum response factor to the cardiovascular system. Proc. Natl Acad. Sci. USA 101, 17132–17137 (2004).
Gauthier-Rouviere, C. et al. p67SRF is a constitutive nuclear protein implicated in the modulation of genes required throughout the G1 period. Cell Regul. 2, 575–588 (1991).
Arsenian, S. et al. Serum response factor is essential for mesoderm formation during mouse embryogenesis. EMBO J. 17, 6289–6299 (1998).
Hill, C. S., Wynne, J. & Treisman, R. The Rho family GTPases RhoA, Rac1, and CDC42Hs regulate transcriptional activation by SRF. Cell 81, 1159–1170 (1995).
Treisman, R. Ternary complex factors: growth factor regulated transcriptional activators. Curr. Opin. Genet. Dev. 4, 96–101 (1994).
Mo, Y. et al. Crystal structure of a ternary SAP-1/SRF/c-fos SRE DNA complex. J. Mol. Biol. 314, 495–506 (2001).
Cooper, S. J. et al. Serum response factor binding sites differ in three human cell types. Genome Res. 17, 136–144 (2007).
Gineitis, D. & Treisman, R. Differential usage of signal transduction pathways defines two types of serum response factor target gene. J. Biol. Chem. 276, 24531–24539 (2001).
Gonzalez Bosc, L. V. et al. Nuclear factor of activated T cells and serum response factor cooperatively regulate the activity of an α-actin intronic enhancer. J. Biol. Chem. 280, 26113–26120 (2005).
Doi, H. et al. HERP1 inhibits myocardin-induced vascular smooth muscle cell differentiation by interfering with SRF binding to CArG box. Arterioscler. Thromb. Vasc. Biol. 25, 2328–2334 (2005).
Zhang, Y., Fillmore, R. A. & Zimmer, W. E. Structural and functional analysis of domains mediating interaction between the bagpipe homologue, Nkx3.1 and serum response factor. Exp. Biol. Med. (Maywood) 233, 297–309 (2008).
Amoutzias, G. D. et al. One billion years of bZIP transcription factor evolution: conservation and change in dimerization and DNA-binding site specificity. Mol. Biol. Evol. 24, 827–835 (2007).
Uht, R. M. et al. A conserved lysine in the estrogen receptor DNA binding domain regulates ligand activation profiles at AP-1 sites, possibly by controlling interactions with a modulating repressor. Nucl. Recept. 2, 2 (2004).
Garcia-Fernandez, J. The genesis and evolution of homeobox gene clusters. Nature Rev. Genet. 6, 881–892 (2005).
De la Houssaye, G. et al. ETS-1 and ETS-2 are upregulated in a transgenic mouse model of pigmented ocular neoplasm. Mol. Vis. 14, 1912–1928 (2008).
Albagli, O. et al. A model for gene evolution of the ets-1/ets-2 transcription factors based on structural and functional homologies. Oncogene 9, 3259–3271 (1994).
Itzkovitz, S., Tlusty, T. & Alon, U. Coding limits on the number of transcription factors. BMC Genomics 7, 239 (2006).
Luscombe, N. M. & Thornton, J. M. Protein–DNA interactions: amino acid conservation and the effects of mutations on binding specificity. J. Mol. Biol. 320, 991–1009 (2002).
Pavletich, N. P. & Pabo, C. O. Zinc finger–DNA recognition: crystal structure of a Zif268–DNA complex at 2.1 A. Science 252, 809–817 (1991).
Nardelli, J. et al. Base sequence discrimination by zinc-finger DNA-binding domains. Nature 349, 175–178 (1991).
Honda, K. & Taniguchi, T. IRFs: master regulators of signalling by Toll-like receptors and cytosolic pattern-recognition receptors. Nature Rev. Immunol. 6, 644–658 (2006).
Bulger, M. et al. Conservation of sequence and structure flanking the mouse and human β-globin loci: the β-globin genes are embedded within an array of odorant receptor genes. Proc. Natl Acad. Sci. USA 96, 5129–5134 (1999).
Ben-Arie, N. et al. Olfactory receptor gene cluster on human chromosome 17: possible duplication of an ancestral receptor repertoire. Hum. Mol. Genet. 3, 229–235 (1994).
Scott, M. P. Vertebrate homeobox gene nomenclature. Cell 71, 551–553 (1992).
Dehal, P. et al. Human chromosome 19 and related regions in mouse: conservative and lineage-specific evolution. Science 293, 104–111 (2001).
Grimwood, J. et al. The DNA sequence and biology of human chromosome 19. Nature 428, 529–535 (2004).
Abbasi, A. A. & Grzeschik, K. H. An insight into the phylogenetic history of HOX linked gene families in vertebrates. BMC Evol. Biol. 7, 239 (2007).
Looman, C. et al. KRAB zinc finger proteins: an analysis of the molecular mechanisms governing their increase in numbers and complexity during evolution. Mol. Biol. Evol. 19, 2118–2130 (2002).
Huntley, S. et al. A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Res. 16, 669–677 (2006).
Vogel, M. J. et al. Human heterochromatin proteins form large domains containing KRAB-ZNF genes. Genome Res. 16, 1493–1504 (2006).
Kikuta, H. et al. Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res. 17, 545–555 (2007).
Lee, A. P. et al. Highly conserved syntenic blocks at the vertebrate Hox loci and conserved regulatory elements within and outside Hox gene clusters. Proc. Natl Acad. Sci. USA 103, 6994–6999 (2006).
Kim, S. K. et al. A gene expression map for Caenorhabditis elegans. Science 293, 2087–2092 (2001).
Arendt, D. The evolution of cell types in animals: emerging principles from molecular studies. Nature Rev. Genet. 9, 868–882 (2008).
Wilson, C. A., Kreychman, J. & Gerstein, M. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J. Mol. Biol. 297, 233–249 (2000).
Todd, A. E., Orengo, C. A. & Thornton, J. M. Evolution of function in protein superfamilies, from a structural perspective. J. Mol. Biol. 307, 1113–1143 (2001).
Freilich, S. et al. Relationship between the tissue-specificity of mouse gene expression and the evolutionary origin and function of the proteins. Genome Biol. 6, R56 (2005).
Hirayama, T. & Shinozaki, K. A cdc5+ homolog of a higher plant, Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 93, 13371–13376 (1996).
Monod, J. & Jacob, F. Teleonomic mechanisms in cellular metabolism, growth, and differentiation. Cold Spring Harb. Symp. Quant. Biol. 26, 389–401 (1961).
King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).
Nielsen, R. et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 3, e170 (2005).
Lai, C. S. et al. A forkhead-domain gene is mutated in a severe speech and language disorder. Nature 413, 519–523 (2001).
Enard, W. et al. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418, 869–872 (2002).
Haygood, R. et al. Promoter regions of many neural- and nutrition-related genes have experienced positive selection during human evolution. Nature Genet. 39, 1140–1144 (2007).
Odom, D. T. et al. Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nature Genet. 39, 730–732 (2007). A ChIP–chip study showing that functionally equivalent TFs bind to different sites in the human and mouse genomes.
Khaitovich, P. et al. Parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees. Science 309, 1850–1854 (2005).
Gilad, Y. et al. Expression profiling in primates reveals a rapid evolution of human transcription factors. Nature 440, 242–245 (2006). A microarray study suggesting that changes in gene expression levels might be important in distinguishing between different primate species.
Stroud, J. C. et al. Structure of the forkhead domain of FOXP2 bound to DNA. Structure 14, 159–166 (2006).
Isalan, M. et al. Evolvability and hierarchy in rewired bacterial gene networks. Nature 452, 840–845 (2008).
Lopez-Bigas, N., Blencowe, B. J. & Ouzounis, C. A. Highly consistent patterns for inherited human diseases at the molecular level. Bioinformatics 22, 269–277 (2006).
Darnell, J. E. Transcription factors as targets for cancer therapy. Nature Rev. Cancer 2, 740–749 (2002).
Engelkamp, D. & van Heyningen, V. Transcription factors in disease. Curr. Opin. Genet. Dev. 6, 334–342 (1996).
Jimenez-Sanchez, G., Childs, B. & Valle, D. Human disease genes. Nature 409, 853–855 (2001).
Wheeler, D. L. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 35, D5–D12 (2007).
Scherzer, C. R. et al. GATA transcription factors directly regulate the Parkinson's disease-linked gene α-synuclein. Proc. Natl Acad. Sci. USA 105, 10907–10912 (2008).
Sinha, S. et al. Systematic functional characterization of cis-regulatory motifs in human core promoters. Genome Res. 18, 477–488 (2008).
Martinez, N. J. et al. A C. elegans genome-scale microRNA network contains composite feedback motifs with high flux capacity. Genes Dev. 22, 2535–2549 (2008).
Berger, M. F. et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 133, 1266–1276 (2008). A high-throughput assay of DNA-binding specificities for 168 mouse TFs using protein-binding microarrays.
Hallikas, O. et al. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006). A study combining SELEX and motif-finding methods to identify the DNA-binding specificities and target regions for five mammalian TFs.
Horak, C. E. et al. GATA-1 binding sites mapped in the β-globin locus by using mammalian chIp–chip analysis. Proc. Natl Acad. Sci. USA 99, 2924–2929 (2002).
Johnson, D. S. et al. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).
Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods 4, 651–657 (2007). Refs 115 and 116 are the first uses of chromatin immunoprecipitation and ultra-high-throughput sequencing to determine genome-wide binding sites of mammalian TFs
Gaspard, N. et al. An intrinsic mechanism of corticogenesis from embryonic stem cells. Nature 455, 351–357 (2008).
The authors thank J. Herrero and A. Vilella for assistance using Ensembl Compara. J.M.V. thanks the Spanish Ministry of Education and Science, the European Science Foundation Exchange Grant and the Marie Curie Biostar programmes for funding. N.M.L. acknowledges funding from EMBL.
- General transcription factor
One of a group of proteins that are essential for transcription from a eukaryotic promoter. They are involved in the formation of the pre-initiation complex and the recruitment of RNA polymerase.
A protein or small molecule that modulates the activity of an enzyme or of another protein complex.
A small highly conserved basic protein, found in the chromatin of all eukaryotic cells. Histones associate with DNA to form nucleosomes.
- Chromatin remodelling protein
A protein that mediates transient changes in chromatin accessibility by modifying the methylation or acetylation status of histones or the methylation status of cytosine residues in DNA.
- Gene Ontology
(GO). A widely used classification system of gene functions and other gene attributes that uses a controlled vocabulary.
A database of conserved protein families, domains and motifs that can be used to annotate amino acid sequences. The presence of a protein domain is often indicative of a particular molecular function.
A procedure to identify protein ligands. For DNA-binding proteins, the protein is mixed with a pool of double-stranded oligonucleotides that contain a random core of nucleotides flanked by specific sequences. The protein–DNA complex is recovered, the oligonucleotides amplified by PCR and sequenced to reveal the binding specificity of the protein.
Loci in two species that are derived from a common ancestral locus by a speciation event. This is different from paralogous members of a gene family that are derived from duplication events.
The use of high-throughput sequencing techniques for transcriptomic profiling.
- Propensity values
A measure of tissue specificity that normalizes the expression value of a TF across all samples, and the expression of all TFs in a single sample. It is commonly used to measure the distribution of amino acids types in different features of protein structures.
- Fisher's exact test
A statistical test of independence between two categorical variables.
- Protein-binding array
A high-throughput technique to determine DNA-binding affinities of proteins using microarrays displaying synthetic oligonucleotides.
The combination of chromatin immunoprecipitation (ChIP) experiments with high-throughput sequencing techniques to quantitate protein targeting or chromatin modifications across the entire genome.
About this article
Cite this article
Vaquerizas, J., Kummerfeld, S., Teichmann, S. et al. A census of human transcription factors: function, expression and evolution. Nat Rev Genet 10, 252–263 (2009). https://doi.org/10.1038/nrg2538
Genetic-variant hotspots and hotspot clusters in the human genome facilitating adaptation while increasing instability
Human Genomics (2021)
Journal of Experimental & Clinical Cancer Research (2021)
Nature Biotechnology (2021)
Evolution of tissue and developmental specificity of transcription start sites in Bos taurus indicus
Communications Biology (2021)
Nature Reviews Neuroscience (2021)