Transcription Factors and Transcriptional Control in Eukaryotic Cells

Citation: Phillips, T. & Hoopes, L. (2008) Transcription factors and transcriptional control in eukaryotic cells. Nature Education 1(1):119

How did eukaryotic organisms become so much more complex than prokaryotic ones, without a whole lot more genes? The answer lies in transcription factors.

Aa Aa Aa

Do complex organisms have more genes than simpler organisms? Now that researchers can sequence whole genomes and have done so for a number of organisms, they know that many vertebrates have only about twice as many genes as invertebrates, and many of these are the result of duplication of existing genes rather than development of new ones. But if there are not that many new genes, what is responsible for the incredible diversity in plant and animal species?

The simple answer to this question is that eukaryotes have developed a more complex way of controlling expression of their existing genes than prokaryotes. This system of expression control relies on a group of proteins known as transcription factors (TFs), and it allows eukaryotes to alter their cell types and growth patterns in a variety of ways. TFs are not solely responsible for gene regulation; eukaryotes also rely on cell signaling, RNA splicing, siRNA control mechanisms, and chromatin modifications. However, TFs that bind to cis-regulator DNA sequences are responsible for either positively or negatively influencing the transcription of specific genes, essentially determining whether a particular gene will be turned "on" or "off" in an organism.

Transcription Factors Recognize Specific DNA Sequences

This solution structure shows the binding of the NFATC1 transcription factor to its DNA binding sequence. The secondary structure of the transcription factor has ten beta strands and a DNA recognition loop.

Figure 1: Solution structure of the core NFATC1-DNA complex.

Topological representation of secondary structure elements in the complex between the NFATC1 transcription factor and its 12-base-pair binding sequence in DNA. The NFATC1-DNA complex shows that NFATC1 is a ten-stranded antiparallel beta-barrel. The two primary sheets (beta-IHFCE and beta-ABG) that form the core of the beta-barrel lie remote from the DNA interface and are almost completely unaffected by being bound to DNA. The third sheet (beta-DG), which does not contact DNA directly but adjoins and abuts multiple segments that do, is also very similar in the free protein and binary complex. The most radical changes that occur upon binding to DNA involve two large surface loops.

Much of the complexity in differentiation in animal and plant cells can be attributed to the evolution of elaborate systems made up of short (6 to 8 base pair) cis-regulatory DNA sequences or motifs, as well as the TFs that bind to the motifs, interact with each other to form complexes, and recruit RNA polymerase II (Levine & Tjian, 2003). Most eukaryotic genes have promoters that consist of the TATA box close to the 5' end of the gene and, farther upstream, several motifs recognized by specific transcription factors. In addition, many genes have one or more other nearby sequences called enhancers. Enhancers affect transcription; these sequences occur upstream, downstream, or within introns, and they continue to work whether in the normal orientation or turned backward in the genome. In yeast, no enhancers are known; instead, there are only upstream activator sequences (UASs). Enhancers can be found thousands of base pairs from a promoter, whereas UASs are generally within a few hundred base pairs upstream. Typical RNA polymerase II promoters can be influenced by many enhancers and by multiple factors bound to the promoter and enhancer sequences.

The mode of action of TFs is to recognize and bind to a segment of DNA in the promoter and/or enhancer region. Often, a change in the conformation, or three-dimensional structure of a TF, will accompany DNA binding. For example, the two loops in NFATC1 that interact with DNA are found in different conformations, depending on whether NFATC1 is complexed with DNA or not (Figure 1). Moreover, the structure of different TF families, described later in this article, results in specific areas in these protein complexes that interact with the DNA recognition motif. The recognition motif is usually only about 6 to 10 base pairs long.

Experiments have shown that TFs can bind tightly, both within cells and in vitro. After TFs bind to promoter or enhancer regions of the DNA, they interact with other bound TFs and recruit RNA polymerase II. Their influence, however, can be either positive or negative, depending on the presence of other functional domains on the protein and the overall impact of the entire TF complex. A typical TF has multiple functional domains, not only for recognizing and binding to the appropriate DNA strand, but also for interactions with other TFs, with proteins called coactivators, with RNA polymerase II, with chromatin remodeling complexes, and with small noncoding RNAs.

TFs control many important parts of development; therefore, organisms with a deletion of a TF gene exhibit profound irregularities in organization and development (Table 1). For example, in Drosophila, deletion of the TF antennapedia gene results in the development of the antennal imaginal disc into legs rather than antennae.

Table 1: Effects of Some Transcription Factor (TF) Gene Deletions in Drosophila

TF Gene Deleted	Gene Group	Type of TF	Phenotypic Effects Observed
Buttonhead	Gap	Zinc finger	Lack of mandibular, intercalary, and antennal head segments
Hairy	Pair rule	bHLH	Ectopic expression of bristles on legs and wings
Antennapedia	Homeotic	Homeobox	Legs on the head where antennae should be

Transcription Factors Exert Combinatorial Control

Many TFs are known to facilitate transcription at hundreds of different promoters, while some are only active at a select few. Laboratory techniques such as chromatin immunoprecipitation (ChIP) and DNA microarrays are commonly used to study the target DNA motifs recognized by individual TFs (Iyer et al., 2001). Signal molecules can influence activation by TFs by covalently binding or modifying their functional domains. It is even possible for a TF to respond to a physical signal, such as red or far-red light, but the signal must be transduced to a chemically modified activator that interacts with the TF.

The complexity and fine gradations of DNA expression in eukaryotes result from combinatorics, in that the combination of chromatin and TF signals, rather than the individual TF signal, is read out. Thus, transcriptional control is dependent on the interactions of all the TFs and whether they attract RNA polymerase or block it from initiating transcription. Multiple TFs can accumulate, creating a bulk the size of a ribosome. Once bound together, changes to the functional domains of a TF and/or covalent interactions with other factors can turn transcription on or off, depending on whether they allow or prohibit the recruitment of RNA polymerase.

A typical enhancer can be up to 500 base pairs in length and contain multiple binding sites for at least two or three different TFs (Levine & Tijan, 2003). Two TFs bound at sites near one another on the DNA strand can combine to form a dimer and bend the DNA in what is believed to be part of the activation process. Chromatin structure allows activators to associate with one another, even when they are bound to DNA sequences many hundreds of base pairs apart. Some TFs are believed to act as tethering elements between distant enhancers and promoters by forming connections with other proteins.

The Evolution of Transcription Factor Families

A three-column diagram shows four eukaryotic DNA-binding motifs, with the name of the DNA-binding motif in the far left column, followed by structural diagrams in the middle column, and a description of their function in the far right column. The middle column shows a schematic representation of each individual molecule at the top and a ribbon diagram of it associated with a double helical DNA molecule at the bottom. The fundamental secondary structural elements of each motif are color-coded.

Figure 2

Figure Detail

Higher organisms have a large number of diverse TF families defined by the sequence of their DNA-binding domains. Evolutionary studies have shown that although the DNA-binding motif is highly conserved among plants and animals, the remainder of these organisms' protein sequences is often very different. In addition, a particular TF family may have different roles in plants than in animals, and some new TFs have evolved in each kingdom since their divergence.

In many animals, including humans, a prominent group of genes involved in cell development, including many that encode TFs, contain a 180 base-pair sequence called the homeobox. The homeobox encodes a 60-amino acid protein segment called the homeodomain, which recognizes and binds to promoters in the DNA of its target genes. Complete control over transcription, and sometimes binding, is dependent on interactions between TFs, so activation often depends upon the presence of another TF. A similar system of gene recognition is found in plants, where the DNA-binding domain is called the MADS box.

TFs often have certain specific DNA-binding motifs, a common one being the basic helix-loop-helix (bHLH) structure that recognizes a specific sequence of DNA and sits on the DNA like a train car on a track. One such example is the TF MyoD (myoblast determination). Expression of the MyoD gene results in production of MyoD protein, which binds to the promoters of muscle-determining genes, causing the differentiation of muscle precursor cells (myoblasts) into muscle fibers. MyoD also binds to its own promoter, thus maintaining its own levels in differentiated muscle cells and their progeny.

In addition to bHLH, there are some other common structural motifs for recognition and binding of DNA, and these are found in most regulatory proteins. These are the helix-turn-helix, zinc finger, and leucine zipper (Figure 2). The NFATC1 example shown in Figure 1 is known as a β-barrel. Proteins having each of these motifs are effective because they fit neatly into the major or minor grooves of the DNA strand, and also because they expose specific amino acids at the appropriate places to form hydrogen bonds with the nucleotide bases. Molecular genetic techniques can be used to change any amino acids to test whether this affects the binding affinity of the TF for the target.

Complexity and Transcription Factors

Complexity of transcriptional control can be illustrated by comparing the number and locations of cis-control elements in higher and lower eukaryotes. For instance, Drosophila typically has several enhancers for a single gene of 2 to 3 kilobases, scattered over a large (10 kilobase) region of DNA, while, as described earlier, yeast have no enhancers but instead use one UAS sequence per gene, located upstream. Long-range regulation is thought to be indicative of the need for a higher level of control over genes involved in cell development and differentiation.

The yeast genome encodes around 300 TFs, or one per every 20 genes, while humans express approximately 3,000 TFs, or one per every 10 genes. With combinatorial control, the twofold increase in TFs per gene actually translates into many more possible combinations of interactions, allowing for the dramatic increase in diversity among organisms. When we consider the additional complexities of chromatin remodeling, regulated mRNA stability, and translational control, it is easier to understand how the cells of higher organisms can produce such an enormous variety of genetic responses to environmental signals.

Conclusion

The cells of higher organisms exhibit an incredible number of genetic responses to their environment. This is largely the result of TFs that govern the way genes are transcribed and RNA polymerase II is recruited. Through these mechanisms, TFs control important aspects of organismal development. In addition, by working in combination with chromatin, TF signals can exert a finer level of control over DNA by allowing for gradations of expression. TF families further increase the level of genetic complexity in eukaryotes, and many TFs within the same family often work together to affect transcription of a single gene. Given the function of TFs, along with other mechanisms of eukaryotic gene regulation, it is not surprising that complex organisms are capable of doing so much with so few genes. It is these processes, more than the number of genes, that separate complex and simple organisms from each other from a genetic standpoint.

References and Recommended Reading

Chen, K., & Rajewsky, N. The evolution of gene regulation by transcription factors and microRNAs. Nature Reviews Genetics 8, 93–103 (2007) doi:10.1038/nrg1990 (link to article)

Hochschild, A., et al. Repressor structure and the mechanism of positive control. Cell 32, 319–325 (1983) doi:10.1016/0092-8674(83)90451-8

Iyer, V., et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538 (2001) doi:10.1038/35054095 (link to article)

Levine, M., & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147–151 (2003) doi:10.1038/nature01763 (link to article)

Sadava, D., et al. Life: The Science of Biology (Gordonsville, VA, W. H. Freeman, 2006)