An extended U2AF65–RNA-binding domain recognizes the 3′ splice site signal

How the essential pre-mRNA splicing factor U2AF65 recognizes the polypyrimidine (Py) signals of the major class of 3′ splice sites in human gene transcripts remains incompletely understood. We determined four structures of an extended U2AF65–RNA-binding domain bound to Py-tract oligonucleotides at resolutions between 2.0 and 1.5 Å. These structures together with RNA binding and splicing assays reveal unforeseen roles for U2AF65 inter-domain residues in recognizing a contiguous, nine-nucleotide Py tract. The U2AF65 linker residues between the dual RNA recognition motifs (RRMs) recognize the central nucleotide, whereas the N- and C-terminal RRM extensions recognize the 3′ terminus and third nucleotide. Single-molecule FRET experiments suggest that conformational selection and induced fit of the U2AF65 RRMs are complementary mechanisms for Py-tract association. Altogether, these results advance the mechanistic understanding of molecular recognition for a major class of splice site signals.


(Legend continued next page)
Sequence conservation of U2AF 65 inter-RRM regions: (a) N-terminal RRM1 extension, (b) inter-RRM linker, (c) C-terminal RRM2 extension. The sequences of twenty known and probable U2AF 65 orthologues were aligned using Clustal Omega 3 and by visual inspection using secondary structure prediction. Human U2AF 65 secondary structure elements are indicated above the aligned sequences, which are colored by sequence identity as follows: >90% dark green, >80% green, >70% yellow. The secondary structure of the U2AF 65 1,2L structure is indicated above: rectangle, α-helix; line, coil; arrow, β-strand and colored blue for new regions (this work) and black for secondary structures assigned in reference 4 . The new N-and C-terminal α-helices are labeled α-N and α-C, respectively. Ribonucleoprotein consensus motifs (RNP1 and RNP2)
Comparison of U2AF 65 bound to uracil versus cytosine at the ninth binding site. Comparison among the structures shows how U2AF 65 adapts to uracil compared with cytosine pyrimidines at the ninth binding site (Fig. 3g-h). When interacting with uracil in structure iii, the D231 carboxylate accepts a hydrogen bond from the dU9-N3H. Following substitution to cytosine in structure iv, the D231 side chain rotates to accept a hydrogen bond with the rC9-N4H 2 exocyclic amine. The U2AF 65 12L structures iii and iv bound to dU9 and rC9 were determined at near physiological pH (pH 7.0). At low pH (pH 4.2 in structure i), the protonated D231 side chain shifts to donate a hydrogen bond to the rU9-O4 as well as the dU8-N3H of the preceding nucleotide (Supplementary Fig. 3i).
Addition of the bulky bromine to the preceding BrdU8 base unstacks and relocates the dU9 uracil (in structure ii, Supplementary Fig. 3j), which could explain the preference of U2AF 65 to bind BrdU at the seventh site ( Fig. 2a and reference 6 ). Otherwise, the sugar moiety shares similar positions among the rC9, rU9, and dU9 nucleotides preceded by dU8, indicating that the dU9 of structure iii is likely to accurately reflect the conformation for rU9 at neutral pH. The unfavorable proximity of the D231 side chain to the rU8-O4 is consistent with the preference of a valine substitution (U2AF 65 -D231V) to bind uridines at the eighth site 7  A primary constraint for the RNA binding functions of the U2AF 65 inter-RRM linker appears to be the protein backbone capacity to form hydrogen bonds with the bound nucleobase. A single mutation (V254P) abolishes hydrogen bond formation between the linker backbone atoms and the central nucleotide and significantly reduces U2AF 65 -RNA affinity. Little consequence for RNA binding is observed following up to 12 substitutions of U2AF 65 linker residues with glycine, which is capable of hydrogen bond formation with the central nucleotide.
The sequence-insensitive intra-molecular contacts of the U2AF 65 inter-RRM linker agree with its low primary sequence conservation (Supplementary Fig. 8). Within the central portion of the linker, the V254 residue that directly recognizes the Py tract is the sole linker residue that is nearly identical among documented U2AF 65 homologues. The D256 residue that indirectly contributes to the fourth binding site is a second rare example of a highly conserved U2AF 65 inter-RRM linker residue, for which a similar change to E is the only common variation among homologues from multicellular organisms. A few conserved residues that serve structural roles in the linker conformation include R228, P229, Y232, and P234 at the initial turn abutting RRM1, and G248 at the apex of the linker between RRM1 and RRM2. Otherwise, the inter-RRM linker sequence composition is variable among U2AF 65 homologues (Supplementary Fig. 8b). The RNA-interacting residues of the U2AF 65 RRM extensions also are highly conserved (Supplementary Fig. 8a,c). For example, the R146 residue that recognizes the terminal pyrimidine base is strictly conserved among U2AF 65 homologues from multicellular organisms with the exception of Caenorhabditis elegans, which may be related to the unusually short, consensus Py tract of this organism. Likewise, U2AF 65 -Q147 is either conserved or replaced by a histidine residue that is capable of similar hydrogen bond interactions. These selective trends in sequence conservation support the importance of these U2AF 65 contacts and conformation for recognition of the central Py tract nucleotides.
Within limits, the length of the inter-RRM linker also varies among U2AF 65 homologues ( Supplementary Fig. 8b). The length of the human U2AF 65 linker lies at a midpoint of approximately 30 residues. In comparison, the inter-RRM linkers of Kluyveromyces lactis and Saccharomyces cerevisiae U2AF 65 homologues are unusually short (~17 residues). This suggests that these yeast S. cerevisiae lack alternative splicing and are set apart from other fungi by a clear U-enrichment near the 3´ end of introns 9 . Conversely, the inter-RRM linkers of U2AF 65 homologues from multicellular plants are unusually long (e.g. 40 residues in Arabidopsis thaliana U2AF 65 ), show increased contents of the serine handle for post-translational modification, and typically replace the V254 counterpart with glycine. These differences suggest divergent and potentially regulated modes of U2AF 65 -Py tract recognition in plants, for which alternative splicing serves key functions in multicellular plant development, defense, and stress response 10,11 .