Main

Translation initiation in eukaryotes

The expression of many eukaryotic genes is regulated at the level of translation initiation. The first step in the initiation process is binding of the small ribosomal subunit (40S) to the mRNA. Since this is often the rate limiting step in initiation, it is also often the target of regulation. This review will discuss recent progress in understanding the proteins involved in mediating ribosome binding to eukaryotic mRNA.

In prokaryotes, base pairing between rRNA and the ‘Shine-Delgarno’ sequence preceding the initiation codon of each open reading frame plays a dominant role in the ribosome binding step. In eukaryotes, however, the binding step is much more complex. The model that best represents the available data (Fig. 1a) posits that the multisubunit eukaryotic initiation factor 3 (eIF3), which is associated with the 40S ribosomal subunit, also binds to the mRNA-associated eIF4G protein and thereby links the ribosome to the mRNA (reviewed in ref. 1). Understanding how eIF4G becomes associated with mRNA is thus central to understanding the processes underlying ribosome binding and translational control.

Figure 1: Organization of the eukaryotic translation initiation apparatus.
figure 1

a, The eIF4G protein serves as a scaffold that connects the mRNA and its associated proteins to the ribosome. Binding of eIF4E to eIF4G is inhibited by the eIF4E binding proteins, the 4E-BPs, while 4E-BP binding to eIF4E is disrupted by phosphorylation. b, Domain structure of the yeast eIF4G1 protein. The relative position of binding sites for various translation factors on yeast eIF4G1 is shown. The mammalian eIF4G proteins have extended C-termini that bind to other factors. Shown below are the amino acid sequences surrounding the eIF4E binding site in the human (h) and yeast (y) eIF4G proteins and the eIF4E-binding proteins 4E-BP1 and Caf20p. The positions of the amino acids relative to the conserved tyrosine at position 0, as well as the minimal consensus sequence for this region, including the position of the conserved hydrophobic residue (Φ), are shown.

In most cases, eIF4G does not only bind directly to mRNA, it also associates with the cap structure at the 5′ end and the poly(A) tail at the 3′ end (reviewed in ref. 2). Initiation factor eIF4G binds the cap and the poly(A) tail indirectly through its interaction with the cap binding protein eIF4E and the poly(A) binding protein Pab1p, respectively. The simultaneous interaction of eIF4G with eIF4E and Pab1p not only places eIF4G on the mRNA, it also leads to mRNA circularization3.

Translation initiation and cell viability depends upon eIF4G binding to eIF4E at the 5′ end of mRNA4,5. Initiation factor eIF4E binds the methylated guanosine cap of mRNA with submicromolar affinity, and binds nonmethylated precursors at least five-fold less efficiently6. Thus, the eIF4G–eIF4E complex binds to the 5′ end of mRNA because eIF4E specifically recognizes the cap. In addition, the association of eIF4G with eIF4E increases eIF4E affinity for the cap by at least 10-fold7,8. This allosteric regulation provides the initiation competent eIF4E–eIF4G complex a selective advantage in binding to mRNA over the apo eIF4E. The eIF4E–cap interaction is regulated in mammals by the phosphorylation of eIF4E via the eIF4G-associated Mnk1 kinase9. Phosphorylation further enhances eIF4E's affinity for both the cap structure and eIF4G10,11. The recent structural work on eIF4E has helped clarify many features of the cap recognition process12,13,14.

Each of the subdomains on eIF4G that bind to eIF4E, Pab1p and other proteins has recently been mapped (Fig. 1b)5,9,15,16,17,18,19,20. The best characterized region of eIF4G is its eIF4E-binding domain. Sequences homologous to this domain have been identified in other eIF4E-binding proteins (Fig. 1b)15,21. Peptides as short as 15 amino acids from this domain can also bind to eIF4E with nanomolar affinity13. How eIF4E interacts with its binding partners and how related eIF4E binding peptides recognize a common surface on eIF4E are questions that have been addressed by the recent structural work13.

The recruitment of eIF4G to mRNA is not only a result of its binding to the cap through eIF4E; eIF4G also associates with the poly(A) tail through Pab1p16. The simultaneous association of eIF4E and Pab1p with eIF4G allows for the synergistic activation of translation initiation in vitro22. In vivo, mRNAs that are both capped and polyadenylated are also translated more efficiently than those that are either capped or polyadenylated23. The molecular basis for this synergy is unknown.

Pab1p binds with subnanomolar affinity to poly(A)24. It is the founding member of the largest class of RNA binding proteins25,26 and contains the so-called ribonucleoprotein (RNP) or RNA-recognition motif (RRM), a universal RNA recognition module27. All Pab1p proteins contain four RRM domains separated by highly conserved linkers28. The poly(A) binding activity of Pab1p resides within its two N-terminal RRM domains, located within the first 200 amino acids of the protein28,29,30. RRM1 and RRM2 are also responsible for its association with eIF4G19,31. The structural basis for the ability of the two N-terminal RRM domains of Pab1p to recognize short oligo(A) RNA has been elucidated32 and is discussed below. This structural work has also defined a putative eIF4G interaction surface on Pab1p.

The machinery responsible for translational initiation utilizes the eIF4E–cap and Pab1p–poly(A) interactions to associate eIF4G with mRNA. These interactions also provide a target for the regulation of protein synthesis. For instance, the amount of available eIF4E in the cell can be determined by titration with a class of proteins that act as competitive inhibitors of the eIF4E–eIF4G interaction, the eIF4E–binding proteins (4E-BPs) (Fig. 1a)33. The 4E-BPs are themselves subject to regulation by cellular kinases in response to signal transduction pathways34. Under growth promoting conditions, these proteins become hyperphosphorylated and their affinity for eIF4E decreases. The structure of eIF4E bound to a fragment of 4E-BP1 has provided insight into how the 4E-BPs interact with eIF4E and how phosphorylation of the 4E-BPs may control their affinity for eIF4E13.

Structure of the eIF4E–m 7 GDP complex

The three-dimensional structure of mouse eIF4E bound to the cap analog m7GDP (ref. 14) and the subsequent structure of the yeast complex12 provided the first detailed view of a eukaryotic translation initiation factor. (Fig. 2a). Together with several structures of viral capping enzymes35,36,37,38, these structures revealed how the cap is specifically recognized by eIF4E. They also led to key predictions about how phosphorylation of eIF4E could enhance its affinity for mRNA, and how eIF4G could bind to eIF4E.

Figure 2: The cap binding protein eIF4E provides two binding surfaces for the methylated cap and peptides derived from eIF4G and the 4E-BPs.
figure 2

a,The β-sheet surface of eIF4E provides a binding site for the methylated guanosine cap, which intercalates in a tight binding cleft between two universally conserved Trp residues. The Watson–Crick face of the guanosine interacts with an electronegative (red) patch, whereas the diphosphate moiety points towards a basic patch on the surface of the protein (blue) that defines the likely path for the RNA. Ser 209 and Lys 159 lie on opposite sides of this path. b, The α-helical surface of eIF4E contains conserved hydrophobic and acidic residues (highlighted in yellow) that form two distinctive patches. eIF4G and 4E-BPs derived peptides (the 4E-BP1 peptide is shown) bind to this conserved dorsal surface of eIF4E. Four amino acids (Tyr 0, Phe +4, Leu +5 and Φ +6, where Φ is any hydrophobic residue, see Fig. 1b) that are nearly universally conserved among all eIF4G and 4E-BP's proteins are highlighted.

The cap binding protein has an αβ structure and its shape resembles a baseball glove12,14. The αβ fold is very common among RNA binding proteins39, but the sequence of eIF4E shares no homology to other proteins, including the components of the nuclear cap binding complex that recognizes the cap in the nucleus before mRNA export to the cytoplasm40. The eIF4E protein contains a curved eight-stranded antiparallel β-sheet that includes the cap binding site, while its convex surface contains three long α-helices. Two short helices within the loops bridging the β1–β2 and the β3–β4 strands of the β-sheet contain conserved Trp residues involved in cap binding. The loops immediately preceding the Trp containing helices retain some mobility in the complex with the cap analog12, suggesting that, in the absence of the cap, these regions of the protein may be disordered.

The methylated guanosine binds a narrow hydrophobic cleft by intercalating between Trp 56 (in helix α1) and Trp 106 (in helix α3; Fig. 2a). The favorable binding energy provided by these stacking interactions is reinforced by van der Waals interactions between these Trp residues and the guanosine sugar. The N7 methyl group is directed towards the interior of the protein, where it contacts a third conserved Trp residue (Trp 166). Glu 103 recognizes the guanosine amino and imino groups through its side chain carboxylate. This residue is conserved in all eIF4E's and its mutation to Ala in human eIF4E abrogates its function6. The diphosphate moiety of the cap analog forms electrostatic interactions with conserved Arg and Lys residues.

The structural principles underlying the recognition of methylated bases and discrimination against unmethylated analogs were first illustrated by the complex between the vaccinia virus VP39 capping enzyme and the cap analog m7GpppG36, and confirmed by the eI4FE structures12,14. In both the VP39 and eIF4E structures, the Watson–Crick face of the guanosine is recognized specifically through multiple hydrogen bonding interactions, while the methylated base is sandwiched between aromatic side chains. These stacking interactions play a dominant role in cap binding38 and therefore in eIF4E function. Changing any of the stacking aromatic residues to Ala in eIF4E or VP39 leads to a complete loss of function. Changes to other aromatic residues result in partial function14,38. In addition, the stacking interactions provide a means for discriminating between methylated and unmethylated bases. Methylation leads to electron deficient π-orbitals, which interact favorably with the electron rich π-orbitals of the Trp indol41. Stacking interactions or sandwiching of methylated bases with aromatic side chains has been observed in solution and in the crystal structures of numerous small molecules36. Furthermore, the structure of the unmethylated cap analog pGGGp and a viral capping enzyme shows stacking of the unmethylated base with a single aromatic side chain, instead of the characteristic sandwich observed in proteins that recognize the methylated base35.

Binding of mRNA to eIF4E is not affected by the identity of the base following the cap. However, binding of a longer cap analog (m7GpppA versus m7Gpp) results in increased rigidity in helix α6 and numerous chemical shift changes in the adjacent loop. These regions are likely to be within the mRNA binding site, which is contiguous to the cap binding slot (Fig. 2a)12. Treatment of cells with growth factors, hormones and mitogens stimulates translation and phosphorylation of Ser 209 in mammalian eIF4E (reviewed in ref. 33). Ser 209 is located along the edge of the putative mRNA binding cleft (Fig. 2a). Phosphorylated Ser 209 may interact with Lys 159 (located on the opposite edge of the cleft) to produce a bridge to cover the mRNA binding cleft near its entrance14. This clamp could stabilize the cap–eIF4E complex and the association of the recruited mRNA with the translation initiation apparatus.

Further insight into the recognition of the remainder of the mRNA by eIF4E was provided by a structure involving VP39 in complex with a capped single stranded RNA hexamer37. In this complex, the six nucleotides form two groups of three stacked bases with a roughly helical geometry that are separated by a sharp turn in the phosphate between the third and fourth nucleotide. Only the first three nucleotides interact directly with VP39, primarily through hydrogen bonds and salt bridges with the sugar-phosphate backbone. Thus, the protein appears to recognize in a sequence-independent manner the conformation of a helical trimer of stacked bases.

Complexes of eIF4E with eIF4G and 4E-BP1 fragments

The convex dorsal surface of eIF4E lies opposite the cap binding site. The binding site for eIF4G was originally predicted to be on this surface of eIF4E based on two separate observations. First, the addition of mammalian 4E-BP2 to yeast eIF4E resulted in marked changes in 20 amide crosspeak resonances clustered on the dorsal surface of eIF4E12. Second, this same region was shown to contain hydrophobic residues conserved in all known eIF4E proteins8,14 (Fig. 2b). These observations identified this region as the likely binding site for the hydrophobic residues common to eIF4G and the 4E-BPs (Fig. 1b). Extensive genetic and biochemical studies lend strong support to this hypothesis8,42.

The eIF4E binding region of eIF4G spans 100 amino acids43, but a stretch of only 10 amino acids shared by eIF4G and 4E-BPs provide most of the interaction energy (Fig. 1b)13. In fact, 15-mer peptides derived from the eIF4E binding region of eIF4G and the 4E-BPs competitively inhibit the eIF4E–eIF4G interaction and translation initiation13. The X-ray crystal structures of the ternary complexes between eIF4E, m7GDP and eIF4E binding peptides revealed the details of these interactions13. Ternary complexes were obtained with peptides derived from both eIF4G2 (one of two isoforms of eIF4G) and 4E-BP1. Both peptides adopt the same L-shaped structure, with an extended stretch preceding a short α-helix. They each bind to the same dorsal surface of eIF4E and make very similar interactions with invariant or highly conserved eIF4E side chains (Fig. 2b). An invariant Tyr in the peptide immediately precedes the beginning of the α-helix and makes multiple van der Waals contacts with nearly invariant eIF4E residues. An invariant Leu in the peptide and the hydrophobic residue that follows it (Fig. 1b) interact with exposed hydrophobic side chains, as well as with each other and the invariant Tyr.

The structures of eIF4E bound to m7GDP and the eIF4G2 or 4E-BP1 peptides clarify biochemical data derived from studies on these complexes. They suggest that the bulk of the energy responsible for the high binding affinity between eIF4E and either eIF4G or the 4E-BPs results from nearly identical interactions with eIF4E. Since the two proteins share no sequence homology outside the eIF4E binding site, 4E-BPs function by mimicking eIF4G in an example of convergent evolution. However, additional interactions between either eIF4G or the 4E-BPs and eIF4E must exist to confer differential affinity and specificity. Genetic and biochemical experiments have shown that the binding sites for eIF4G and 4E-BP on eIF4E are not identical; although they share a core set of amino acids, a subset of mutations that decrease eIF4G binding do not affect 4E-BP protein binding8,42. Also, regions outside the eIF4E-binding peptide may interact with eIF4E since the entire C-terminal half of 4E-BP1 and a larger fragment of eIF4G were protected from proteolytic degradation when bound to eIF4E13,43.

Another important question that can be addressed in the context of these structures concerns how phosphorylation of the 4E-BPs lead to their decreased affinity for eIF4E. A site of phosphorylation on 4E-BP1 (Ser 65) lies just C-terminal to the ordered region observed in the crystal structure. This residue is likely to be located near an acidic patch in eIF4E. Therefore, electrostatic repulsion between the phosphorylated Ser65 in 4E-BP1 and the conserved Glu and Asp residues in eIF4E has been postulated to disrupt the 4E-BP1–4E complex13.

Questions that remains unanswered are how binding of eIF4G to eIF4E increases binding of eIF4E to the cap, and how mutations in eIF4E surface residues that reduce the eIF4E–eIF4G interactions also reduce cap–eIF4E binding7,8,42. The structures of the binary eIF4E–m7GDP and the ternary eIF4E–m7GDP–eIF4G complexes are nearly identical13,14. This indicates that the cap–eIF4E interaction is not perturbed by the presence of eIF4G or the 4E-BP peptides. Furthermore, the peptides bind 35 Å away from the cap binding site, and do not interact directly with the cap or with residues that interact with the cap. One possible explanation for how eIF4G and the 4E-BPs alter eIF4E affinity for the cap is that regions of eIF4G and 4E-BP1 not included in the crystal structure contribute to cap binding by altering the structure of eIF4E. An alternative (or additional) explanation originates from the observation that binding of 4E-BP and eIF4G to eIF4E occurs by induced fit43,44,45, and that the loops within eIF4E that bind the cap retain conformational flexibility in the complex12. Perhaps apo eIF4E has a poorly defined m7GDP binding site and binding of eIF4G and/or m7GDP stabilizes the eIF4E structure, leading to better definition of the cap binding site and increased m7GDP and/or eIF4G binding. Phosphorylation of eIF4E could also affect m7GDP and eIF4G binding by stabilizing the eIF4E structure in a conformation competent to bind m7GDP and eIF4G. The validity of this model remains to be tested by systematic comparisons of the structure and dynamics of eIF4E when free or bound to the cap and other proteins.

Interaction between Pab1p and oligo(A)

The structure of two RRMs of Pab1 bound to oligo(A) is the most recent addition to the growing list of structures of components of the eukaryotic translational initiation apparatus32. Other structures of RNA–protein complexes include that of human U1A protein46,47, the best studied example. U1A demonstrated how recognition of structured RNA by a single RRM occurs, while two other structures revealed how single stranded RNA can be recognized by proteins containing two RRMs arranged in tandem. These are the cocrystal structures of Drosophila Sex-lethal bound to a U-rich sequence derived from a splicing regulatory element48, and the structure of heterogeneous nuclear ribonucleoprotein A1 (hnRNP A1) bound to a purine-rich DNA telomeric sequence49.

Like hnRNP A1 and Sex-lethal, Pab1p requires two RRMs in order to bind poly(A). The first two RRMs of Pab1p are responsible for specific, high affinity binding to oligo(A) and for binding to eIF4G19,28,29,30,31. In the crystal structure of the Pab1p–oligo(A) complex32, the single stranded oligo(A) RNA adopts an extended conformation running through a binding surface lined by the antiparallel β-sheet of each domain and backed on the opposite surface by four α-helices (Fig. 3a,b). The highly conserved linker between RRM1 and RRM2 of Pab1p also contacts the RNA.

Figure 3: The poly(A) binding protein Pab1p provides two binding surfaces for poly(A) and eIF4G.
figure 3

a, Many RNA-binding proteins that bind single stranded RNA utilize two contiguous RRM domains. The structure of Pab1p, Sex-lethal and hnRNP A1 show that the arrangement of these domains differs in each of these three proteins, and that the RNA interacts differently with the protein β-sheet surface in each case. Sex-lethal binds U-rich sequences by creating a narrow cleft between RRM1 and RRM2, whereas oligo(A) interacts with an extended β-sheet surface created by the contiguous arrangement of RRM1 and RRM2. In both Pab1p and Sex-lethal, the RNA runs a similar path along the two RRMs: the 5′ half of the RNA interacts with RRM2, whereas the 3′ half of the sequence interacts with RRM1. b, The RNA binding surface of Pab1p, with eight well-ordered adenosine residues interacting with the extended β-sheet surface defined by RRM1 (dark blue) and RRM2 (light blue) and the conserved linker joining the two domains (purple). c, The opposite dorsal surface of Pab1p contains conserved hydrophobic and acidic residues (in red). These residues form two distinctive patches on the dorsal surface of the Pab1p structure and originate from both RRM1 (D70 and F74) and RRM2. This surface is likely to provide the binding site for eIF4G, but the structural details of this interaction remain to be elucidated.

With the Pab1p–oligo(A) structure, there are now three distinct examples of nucleic acid recognition by multidomain RRM proteins48,49. In each case, two independently folded RNA binding domains act synergistically to increase affinity and specificity for single stranded RNA. The binding surface is created by simultaneous interactions of the nucleic acid with each of the two domains, as well as with the inter RRM linker sequence whose helical structure is stabilized by the nucleic acid. An RNA stabilized α-helix in the C-terminus of human U1A protein also plays an important role in determining RNA specificity46. The interdomain arrangement in the Pab1p–oligo(A) complex, however, is very different from those observed in both the Sex-lethal complex (a V-shaped cleft)48 and in the hnRNPA1 complex49 (Fig. 3a). The binding cleft between RRM1 and RRM2 is much narrower in the Sex-lethal–U-rich RNA complex than in the Pab1p–oligo(A) complex (Fig. 3a)32,48, allowing for some discrimination based on the different sizes of purines and pyrimidines.

In the Pab1p complex, the arrangement of RRM1 and RRM2 is stabilized by inter-RRM contacts. However, the interface area defined by these interactions is small (only 550 Å2, while the surface area buried by protein–RNA contacts is 2,600 Å2). Therefore, RNA–protein interactions are likely to be important in defining the relative orientation of RRM1 and RRM2. It is also likely that the RNA induced orientation of neighboring RRMs is important in defining the binding specificity of these proteins.

The full length Pab1 protein was shown by titration and nuclease protection experiments to cover 25 nucleotides of poly(A), but its primary binding site is 11–12 nucleotides long24. Eight adenines are visible in the electron density map of the Pab1p–oligo(A) complex (Fig. 3b). Adenosine recognition by RRM1 and RRM2 of Pab1p is mediated in part by contacts with conserved residues in the two central strands of the β-sheet of each RRM (which contain the RNP-1 and RNP-2 sequences). Some of these residues take part in intermolecular stacking interactions analogous to those observed in all structures of complexes involving other RNP proteins. These interactions are thus likely to be universal features of RRM–RNA recognition27. Mutation of the residues involved in these interactions, for example of Phe142 in yeast Pab1p, disrupts high affinity binding of yeast Pab1p to poly(A)29. Contacts with conserved residues provide the basal RNA binding activity of the RRM27. This would explain why the path of the RNA across the β-sheet surface is similar in the reported structures of RRM–RNA complexes.

Conformational preferences resulting from the strong tendency of adenine residues to form planar stacking interactions are likely to play an important role in the binding specificity of Pab1p for Poly(A). Extensive intramolecular stacking interactions between adenine residues are observed in the Pab1p complex (Fig. 3b), whereas only one intramolecular stacking interaction is observed in the Sex-lethal complex. Other interactions observed in the Pab1p complex involve residues that are not conserved between different RRM domains. These presumably provide interactions specific to Pab1p. They include contacts with the phosphodiester backbone and the ribose sugars (seven out of eight sugars in the binding cleft are in contact with protein residues). In addition, direct sequence specific contacts with the purine bases (Fig. 3b) are provided by amino acids in loop 3 of each RRM (connecting β2 and β3), in the linker between RRM1 and RRM2, and in the C-terminal tail of RRM2.

The pattern of sequence conservation in Pab1p has enable speculation on the structure of RRM3 and RRM4 of Pab1p and its interaction with poly(A)32. The majority of residues participating in RNA recognition are conserved between RRM1 and RRM3 and between RRM2 and RRM4, but not between RRM1 and RRM4 or RRM2 and RRM3. This conservation suggests that RRM3 and RRM4 may form an RNA binding surface similar to that formed by RRM1 and RRM2. The reduced affinity and specificity for poly(A) of RRM3 and RRM429 is likely to be a consequence of sequence divergence in RNA binding residues, including the linker region connecting the domains. Future work will address if and how the poly(A) tail interacts simultaneously with all four RRMs of Pab1p.

Predictions on the Pab1p–eIF4G complex

The α-helical dorsal surface of Pab1p is a phylogenetically conserved hydrophobic-acidic region (Fig. 3c). As with predictions about the conserved dorsal surface of eIF4E and its role in eIF4G/4E-BP binding, it has been predicted that this conserved region would bind to eIF4G32. In support of this suggestion, mutagenesis of several residues within this region has already demonstrated its importance for eIF4G binding50. One interesting feature about this surface is that it spans two RRMs that are very likely to be juxtaposed as a result of oligo(A) binding (Fig. 3c). If the eIF4G binding site really does span the two RRMs, then perhaps the reason that yeast but not mammalian Pab1p requires poly(A) for eIF4G binding16,19 is that yeast eIF4G by itself cannot stabilize RRM1 and RRM2 packing while mammalian eIF4G can.

The structure of the ternary complex containing Pab1p RRM1 and RRM2, oligo(A) and a suitable fragment of eIF4G is anxiously awaited. Key questions that would be addressed by this structure are how eIF4G binds specifically to Pab1p, and if Pab1p stabilizes a unique eIF4G conformation. Similarly, it will be essential to determine if binding of eIF4G to Pab1p changes the Pab1p–oligo(A) interface. More general questions will include the identification of Pab1p repressors akin to the 4E-BPs based on sequence predictions using the observed interface between Pab1p and eIF4G as a guide. Finally, seeing both Pab1p and eIF4E bound to a large fragment of eIF4G would present unprecedented opportunities for understanding how different components of a translation pre-initiation complex interact with each other.

Conclusions and perspectives

Translational control is a major regulatory step in normal and abnormal cell growth. The structures of translation initiation factors bound to mRNA and protein fragments have provided critical insight into how mRNAs are delivered to the ribosome. We have now begun to understand how the cap and the poly(A) tail are recognized, and how surfaces of the cap and poly(A) binding proteins function in protein recognition. We still do not know the structure of most translation initiation factors, and thus it is still very difficult to visualize the mechanisms underlying translational regulation. Clearly, we have just begun to scratch the surface of the bewildering beauty and complexity of the apparatus controlling the initiation of protein synthesis.