The structure and substrate specificity of human Cdk12/Cyclin K

Phosphorylation of the RNA polymerase II C-terminal domain (CTD) by cyclin-dependent kinases is important for productive transcription. Here we determine the crystal structure of Cdk12/CycK and analyse its requirements for substrate recognition. Active Cdk12/CycK is arranged in an open conformation similar to that of Cdk9/CycT but different from those of cell cycle kinases. Cdk12 contains a C-terminal extension that folds onto the N- and C-terminal lobes thereby contacting the ATP ribose. The interaction is mediated by an HE motif followed by a polybasic cluster that is conserved in transcriptional CDKs. Cdk12/CycK showed the highest activity on a CTD substrate prephosphorylated at position Ser7, whereas the common Lys7 substitution was not recognized. Flavopiridol is most potent towards Cdk12 but was still 10-fold more potent towards Cdk9. T-loop phosphorylation of Cdk12 required coexpression with a Cdk-activating kinase. These results suggest the regulation of Pol II elongation by a relay of transcriptionally active CTD kinases.

C ell division and transcription are tightly regulated by cyclin-dependent kinases (CDKs) and their regulatory cyclins. Whereas Cdk2/Cyclin A (CycA) and Cdk4/Cyclin D (CycD) are major players involved in the regulation of the cell cycle, five kinases-Cdk7, Cdk8, Cdk9, Cdk12 and Cdk13-have now been described as transcription-regulating kinases 1 . They phosphorylate the C-terminal domain (CTD) of RNA polymerase II (Pol II), thereby regulating different phases of the transcription cycle from transcription initiation to elongation and termination 2 . The CDKs and their corresponding cyclins form specific complexes including the Cdk7/CycH/Mat1 complex, which is part of the general transcription factor TFIIH, the Cdk8/ CycC kinase module of the Mediator complex, the Cdk9/CycT complex that constitutes the active form of the positive transcription elongation factor (P-TEFb) and the recently discovered metazoan kinases Cdk12 and Cdk13, which both associate with Cyclin K (CycK).
The transcription-associated kinases phosphorylate serine residues within the hepta-repeat sequence Y 1 S 2 P 3 T 4 S 5 P 6 S 7 of Rpb1, the largest subunit of RNA Pol II. The number of heptarepeats varies from 26 in yeast to 52 in mammals, with some variation in the distal part of the CTD; among the variant repeats, ones with a lysine at position 7 are most common. The CTD is essential for cell viability, and partial truncations or site-specific mutations lead to specific growth defects 3 and defects in recruitment of mRNA-processing machinery 4 . It is thought to act as a scaffold to coordinate the binding of proteins involved in the different phases of transcription and couples transcription with other nuclear processes such as mRNA maturation and the modification of chromatin. The three serines (Ser2, Ser5 and Ser7), the threonine (Thr4) and the tyrosine (Tyr1) of each repeat can be phosphorylated 5,6 . In addition, both peptidyl-prolyl bonds (Pro3 and Pro6) can be isomerized, the distal lysines were shown to be reversibly acetylated 7 and an arginine at position 7 in a degenerate repeat can be methylated 8 . Combinations of these reversible CTD modifications spanning multiple repeats allow for a multitude of different states, which led to the hypothesis of a CTD code 9 .
The RNA Pol II transcription cycle starts with the recruitment of the unphosphorylated polymerase and the formation of the preinitiation complex. At the transcription start site (TSS), an increase in Ser7 and Ser5 phosphorylation is observed. While the signal for phosphorylated Ser7 (pSer7) remains high throughout the transcription cycle, the pSer5 signal decreases steadily towards the poly-adenylation (poly-A) site. Conversely, the content of pSer2 and pThr4 marks is low at the TSS but increases downstream. The signals for pSer2 are the highest at and downstream of the poly-A site, consistent with the recruitment of 3 0 -RNA processing factors by Ser2-phosphorylated CTD. High levels of pTyr1 marks in the body of genes in yeast instead favour the binding of elongation factors and prevent binding of termination factors to the CTD 10 . These phospho marks are removed by phosphatases during the termination process. It is a matter of debate whether the CTD phosphorylation cycle is uniform across all genes, or whether it is established in a genespecific manner as suggested by two studies 11,12 . Another study showed that the phosphoserine marks are set and removed depending on their distance from the TSS and termination site, respectively, with no significant differences among genes 13 . A systematic approach examining the genome-wide distribution of CTD modifications indicated considerable interplay between CTD kinases and phosphatases, suggesting that transcription operates in a uniform mode at virtually all genes 14 .
The identification of Cdk12 and Cdk13 as CTD kinases in metazoans [15][16][17] seemed to resolve a long-standing conundrum: the two yeast species Saccharomyces cerevisiae (S.c.) and Schizosaccharomyces pombe (S.p.) each have two CDKs required primarily during transcript elongation (Bur1 and Ctk1 in S.c., and Cdk9 and Lsk1 in S.p.), whereas metazoans were thought to have only one (Cdk9). Whereas some Ser2 and Ser5 phosphorylation is ascribed to Bur1 and Cdk9 in budding and fission yeast, respectively; phosphorylation of Ser2 in vivo seems to be mostly due to the non-essential CDKs, Ctk1 in S. cerevisiae and Lsk1 in S. pombe 18,19 . Interspecies sequence alignments of kinase domains indicate that Bur1 and fission yeast Cdk9 are orthologues of metazoan Cdk9, whereas Ctk1 and Lsk1 are orthologues of metazoan Cdk12 and the closely related Cdk13. At 1,490 and 1,512 amino acids, respectively, human Cdk12 and Cdk13 are unusually large kinases that share 43% sequence identity and harbour a central kinase domain. The assignment of specific serine phosphorylations in the CTD to particular CDKs remains nonetheless uncertain, because different preferences were reported in different studies. Other kinase substrates such as the transcription elongation factor subunit Spt5 also contribute to the complexity 20 . Cdk7 prefers Ser5 as a substrate in vitro, as do its orthologues, budding yeast Kin28 and fission yeast Mcs6; however, Cdk7, Mcs6 and Kin28 have also been implicated in generating the pSer7 mark [21][22][23][24] .
We have determined the crystal structure of the human Cdk12/ CycK pair in complex with ADP and aluminium fluoride as a transition state mimic at 2.2 Å resolution. We show that Cdk12 contains an additional C-terminal helix, aK, outside the canonical kinase fold that makes multiple water-mediated interactions with both the N-and C-terminal lobes as well as the ribose of the bound nucleotide. This helix, followed by a polybasic cluster, is conserved in CTD kinases regulating transcriptional elongation, and its importance was revealed by the gradual loss of kinase activity towards the CTD with progressive truncation of this C-terminal extension. Similar to most CDKs, Cdk12 requires T-loop phosphorylation by a CDK-activating kinase (CAK) for full activity. Finally, Cdk12 exhibits the highest activity on a CTD substrate prephosphorylated at Ser7 suggesting a kinase relay mechanism to order events during the transcription cycle.

Results
Structure of the Cdk12/Cyclin K complex. Human Cdk12/CycK was expressed in baculovirus-infected insect cells co-transfected with Cak1 from S. cerevisiae, and the Cdk-cyclin complex was purified by affinity and size-exclusion chromatography ( Supplementary Fig. 1). Crystals were grown in the presence of ADP, aluminium fluoride and the 13-residue substrate peptide P-pS-YSPTSP-pS-YSPT by the hanging drop diffusion technique. The structure was solved at a resolution of 2.2 Å by molecular replacement with the coordinates of CycK and Cdk9 as a search model (see Methods). The kinase complex was refined to an R work of 19.5% and R free of 24.0% with excellent stereochemistry (Table 1). Two Cdk12/CycK heterodimers form the asymmetrical unit cell of the protein crystal, whereas no crystallographic density was observed for the substrate peptide but only for the AlF 3 transition state mimic (Fig. 1a,b). Overall, the structures of Cdk12 and CycK in the Cdk12/CycK complex are similar to previously determined structures of other CDKs and cyclins. However, there are substantial differences in the assembly of the two subunits and an unexpected conformation of the kinase C-terminal extension that associates with the ATP substrate. During revision of this manuscript, a crystal structure of Cdk12/CycK at 3.15 Å resolution under the accession number 4CJY was released from the Protein Data Bank.
The orientation of the cyclin with respect to the Cdk is rotated by about 25°compared with the cell cycle Cdk-cyclin complexes (for example, Cdk2/CycA; Supplementary Fig. 2). This arrangement is similar to that of Cdk9/CycT1 (ref. 25), but the buried molecular surface area of 2,294 Å 2 upon Cdk12/CycK complex formation (corresponding to 79% of the Cdk2/CycA interface) is significantly larger than that of Cdk9/CycT1 (1,763 Å 2 , 61%). The Cdk12/CycK interface involves only the N-terminal lobe of the kinase; in contrast, Cdk2/CycA buried surface area involves both kinase lobes and encompasses 2,892 Å 2 . Cyclins involved in cell cycle control (CycA, CycB and CycE) and those involved in transcription (CycT1 and T2, CycC, CycH and CycK) vary significantly outside the classical cyclin box fold in length and orientation of the adjacent H N and H C helices. The gain in interface area in Cdk12/CycK compared with Cdk9/ CycT1 is partly due to additional contacts of the N-terminal region of CycK (residues [19][20][21][22] with the N-terminal lobe of the kinase. Extensive electrostatic interactions are made between the second glutamate (E108) of the cyclin motif KFEEF (where F is a hydrophobic residue) at the end of helix H3 and R773 of the PITAIRE helix motif of the kinase (Fig. 1c). In addition, R145 in helix H5 of CycK forms multiple interactions with the kinase N-terminal lobe, which are likely to contribute to the specificity of the Cdk12/CycK complex formation because this residue is not conserved in other cyclins ( Supplementary Figs 3 and 4). The C-terminal region of CycK following the cyclin box fold is rich in prolines and glutamines and has been omitted for crystallization.
Interestingly, the 'hydrophobic pocket' present in CycA and CycE (characterized by the 210 MRAILVDWxxE 220 sequence in helix H1 of CycA), which serves as a recruitment site for Cdk2 substrates and the p27 Kip1 inhibitor 26,27 , is replaced by the highly polar sequence 49 YRREGARFxxD 59 in CycK. In CycA, the acidic residue E220 following the MRAIL motif interacts with the arginine of the RxL recruitment motif in kinase substrates. This position corresponds to D59 in CycK and is accessible for possible interactions with regulatory factors ( Supplementary  Fig. 5). The same site in CycT1 is partly hidden by the presence of a smaller leucine (L44) compared with F56 in CycK and W217 in CycA. In addition, both CycT1 and CycK contain a long loop inserted between helices H4 and H5 ( Supplementary  Fig. 3). These differences could generate the recognition specificity for Cdk-Cyclin regulatory factors. Substrates, inhibitors and activators such as the Tat protein or Hexim1 were all shown to interact with this surface patch of the cyclin subunit in different Cdk-Cyclin complexes [26][27][28][29][30] .
The kinase active center. The Cdk12 structure exhibits a typical kinase fold comprising the N-terminal lobe (715-816) and the C-terminal lobe (817-1,020) with 41.2% or 36.7% sequence identity to Cdk9 or Cdk2, respectively. Cdk12 is in the active kinase conformation as characterized by the orientation of the aC helix, also known as the PITAIRE helix (PITALRE in Cdk9; Supplementary Fig. 6). The 877 DFG motif at the start of the activation segment and the activation segment itself adopt a conformation that allows access of the substrate to the catalytic site. Phospho-T893 within the activation segment is clearly visible in the electron density map and interacts with R858 and R882 (Fig. 1c). The third canonical arginine R773 is about 6.6 Å away but tightly interacts with the KFEEF of CycK as seen in the stereo image of the electron density map (Fig. 1d).
The glycine-rich loop 734 GEGTYG, or P-loop, which functions as a lid atop the ATP phosphates, coordinates the phosphates with the main chain amide group and the side chain hydroxyl group of T737 reaching out to the b-phosphate of the nucleotide. The presence of ADP and AlF 3 in the crystallization condition led to the incorporation of ADPAlF 3 and two magnesium ions, MgI and MgII, in the crystal structure (Fig. 1b). The second magnesium ion MgII is coordinated in an almost ideal hexacoordinate octahedral geometry with distances between 1.9 and 2.3 Å. It is coordinated by ADP aand b-phosphate oxygens, D877 of the DFG motif, N864, one of the AlF 3 fluorine atoms, and a single water molecule ( Supplementary Fig. 7). The first magnesium MgI is observed in the same position as previously seen for protein kinase A and a Cdk2/CycA transition state complex 31,32 . MgI is less well coordinated and the occupancy in the crystal lattice appears reduced, which might go along with its proposed function as the catalytic Mg 2 þ (ref. 33). The catalytically important D859 points with its carboxyl side chain to AlF 3 with a distance of 3.0 Å ( Supplementary Fig. 7). The glycine-rich loop and the ATP transition state mimic is far from any crystallographic neighbours in the determined structure, so the conformation is not likely to be a consequence of crystal packing.
A kinase C-terminal helix extends to the bound ATP. While crystallization attempts using a protein construct comprising only the canonical kinase domain were not successful, the structure showed that the protein extends by about 30 residues following the last helix aJ of the kinase domain fold ( Supplementary Fig. 8).
A stretch of 23 residues (1,017-1,039) meanders around the C-terminal kinase lobe to associate in the ATP cleft with the bound nucleotide. A 1038 DCHEL motif reaches out to the N-terminal lobe, the ATP ribose and the C-terminal lobe, and interacts with the bound nucleotide by multiple water-mediated contacts ( Fig. 2a,b). D1038 forms a salt bridge with K743, a polar interaction with Y815 and a water-mediated interaction with D817. The H1040 Ne2 amine interacts directly with the N3 nitrogen of the adenine base at 3.0 Å. The water-mediated interactions of E1041 and L1042 with the nucleotide are weaker, with distances of 2.3 and 3.9 Å, respectively. The HEL motif initiates a C-terminal helix that leads into the polybasic cluster 1045 KKRRRQR, which is only partially resolved in the crystal structure ( Fig. 2c and Supplementary Fig. 8b). This helix bundles in parallel orientation to the C-terminal lobe helix aD. The structure thereby resembles a 'closed conformation' for the bound nucleotide in which the free exchange of ADP for ATP is impaired by the intra-molecular contacts of the C-terminal helix formed with the kinase core domain. The approach of the C-terminal helix aK towards the kinase active site is enabled by the presence of a small glycine (G822) in helix aD, which allows for association of this helix with the bound nucleotide (Fig. 2d). In Cdk2, this position is filled by a lysine (K89) whose amino group points towards the ATP ribose hydroxyl groups 32 . Interestingly, a glycine at this position is found in the human CTD kinases Cdk9, Cdk12 and Cdk13 as well as the S. cerevisiae and S. pombe orthologues Bur1, Ctk1, Cdk9 and Lsk1 (Fig. 2c). In contrast, human Cdk2, -3, -4, -5, -6, -7, -8 and -10 all contain bulkier residues such as lysine, threonine, valine, histidine or serine at this position, which might prevent association of a C-terminal extension with the nucleotide. The appearance of a glycine in helix aD thus correlates with the presence of the HE motif in the C-terminal extension that is followed by a polybasic cluster in the CTD kinases (Fig. 2c).
Strikingly, progressive truncation of the C-terminal extension of Cdk12 led to a corresponding decrease in kinase activity. Whereas truncations from 1,082 to 1,063 produced only a small decrease in activity, truncation of the polybasic cluster at position 1,044 reduced its catalytic activity by approximately fourfold (Fig. 2e). However, additional shortening of the HE motif did not further reduce the kinase activity. These data suggest that the C-terminal extension of Cdk12 including its polybasic cluster is an important element for the catalytic activity of the kinase.
Activation of Cdk12/CycK by CAK1. In contrast to Cdk9/CycT1 (refs 25,34,35), expression of human Cdk12/CycK in the baculovirus expression system did not result in T-loop phosphorylation of the kinase as determined by mass spectrometry. Although Cdk12/CycK kinase complex lacking T-loop phosphorylation showed some basal activity towards a CTD substrate prephosphorylated at position Ser7, its activity was significantly increased upon coexpression with the CAK from S. cerevisiae ( Supplementary Fig. 9a). Mutation of T893 to E to mimic phosphorylation showed no effect on basal kinase activity. Quantitative phosphorylation of a single residue occurred upon coexpression with Cak1, as determined by ESI mass spectrometry ( Supplementary Fig. 9b). The site of phosphorylation was identified as the activating T-loop residue T893 by peptide mass finger print analysis and confirmed in the crystal structure.
Cdk12 prefers a prephosphorylated CTD substrate. The highly variable patterns of phosphorylation marks in the hepta-repeats of Rpb1 are thought to regulate the elongation rate, RNA processing and chromatin modification 36 . Particular combinations of modifications may generate specific docking sites to recruit specific regulatory factors. We designed a series of synthetic CTD peptides, each containing three consensus hepta-repeats that were either unmodified or uniformly phosphorylated at all three Tyr1, Ser2, Thr4, Ser5 or Ser7 positions ( Fig. 3a Table 1). The series also included a peptide containing substitution of serine to lysine at position 7. Surprisingly, despite its phosphorylation at the activating threonine T893 in the T-loop sequence, Cdk12/CycK showed low activity towards the unmodified consensus CTD peptide (Fig. 3b). This differs significantly from P-TEFb, which was highly active on the unmodified CTD, and could phosphorylate every hepta-repeat within an eight-repeat peptide in a 4-h incubation 34 . Among the prephosphorylated substrate peptides, pS7-CTD [3] stood out as the optimal Cdk12 substrate. CTD peptides pTyr1, pSer2 and pThr4 were all phosphorylated more efficiently than the unmodified sequence, but even the best of these, pSer2, was modified at only 25% efficiency compared with pSer7. There was no activity above background towards the pSer5 peptide. Similarly, the YSPTSPK sequence was not phosphorylated by Cdk12. Again, this differs markedly from results with P-TEFb, which phosphorylates both consensus and K7-substituted CTD substrates to the same extent and with similar catalytic efficiency 34 . Analysis by mass spectrometry revealed that up to three phosphorylations were placed on the pS7-CTD substrate A network of weak water-mediated contacts is formed between E1041 and L1042 of the C-terminal helix and the hydroxyl groups of the ATP ribose that is stabilized by I733, E735 and T737 of the glycine-rich loop. (b) D1038 interacts with residues of the N-and C-terminal kinase lobe, while H1040 directly contacts the adenine base of the bound nucleotide. (c) Sequence alignment of the transcription-associated kinases Cdk12 and Cdk9 from human and their respective orthologous from S. cerevisiae and S. pombe compared with Cdk2. A C-terminal HE-motif followed by a polybasic region is conserved in the CTD kinases. Residues of the N-and C-terminal lobe that interact with the C-terminal helix are shaded grey. (d) Surface representations of Cdk12 (this study) and Cdk2 (3QHR; ref. 32) show the different accessibilities of the ATP ribose in the transcriptionassociated kinase and the cell cycle kinase. While a lysine residues (K89) in Cdk2 points towards the hydroxyl groups of the ATP ribose, this position is engaged with a small glycine in CTD kinases. (e) The activity of Cdk12 decreases upon truncation of the C-terminal helix. Four different length versions of Cdk12 were tested for their activity with two different kinase substrates, consensus CTD and pS7-CTD, using the same CycK (1-267) construct. Truncation of the polybasic cluster in Cdk12 from residue 1,063 to 1,044 led to a significant loss in activity for the pS7-CTD substrate. Data are the mean±s.d. of three independent experiments. after 4-h incubation, while hardly any phosphorylation was detected on the unmodified peptide (Fig. 3c). This result was confirmed in time course experiments that showed the kinetics and saturation of the reaction (Fig. 3d). We conclude that Cdk12 prefers a substrate prephosphorylated at Ser7 to exhibit maximal kinase activity towards the CTD.  [3] Y S P T S P S Y S P T S P S Y S P T S P S M W = 2,634 Da cons. CTD [3]  cons. CTD [3] pS7-CTD [3] 0 0 Substrate preferences of Cdk12 and Cdk9 kinases. To explore Cdk12-mediated CTD phosphorylations in more detail, we performed mutagenesis experiments and western blot analyses with either synthetic CTD peptides or native full-length RNA Pol II CTD. Using a similar strategy as described above, four CTD peptides were generated that contained alanine at either position 2 or position 5 within each repeat, both with and without Ser7 prephosphorylations in each repeat (Fig. 4a). As before, there was little activity towards the non-phosphorylated substrates. However, when the Ser7 position was phosphorylated, the S2A peptide was a better substrate than the S5A peptide, indicating that Cdk12 can phosphorylate Ser5. In a second set of experiments, we placed single Ser2, Ser5 or Ser7 phosphorylation marks at either end of the three-repeat substrate (Fig. 4b). Activity of Cdk12/CycK towards any of these peptides only marginally exceeded that towards the unphosphorylated peptide, with the peptide containing a single pS7 mark at the C terminus being the best substrate. However, the phosphorylation activity was less than one-third of that towards the uniformly phosphorylated pS7-CTD [3] substrate. In contrast, when we used P-TEFb to phosphorylate the same peptides, both pS5-C and pS7-C, but not their N-terminal counterparts, were phosphorylated to a greater extent than the unphosphorylated peptide ( Fig. 4c), in line with the recently described priming of the CTD by previous phosphorylation marks 18,24,34 . These data suggest that Ser7 prephosphorylations cause no priming effect of the CTD for Cdk12 activity but are instead necessary elements of the kinase recognition motif.
Cdk12/CycK is a promiscuous CTD kinase. To further compare Cdk12 and Cdk9 kinase specificities, we performed western blot analyses using anti-phospho-CTD-specific antibodies raised against pSer2, pSer5, pSer7 and pThr4 phosphorylation marks. The hypophosphorylated form of RNA Pol II, Pol IIA, was purified from HeLa cell extracts using a specific a-Rpb1 antibody, as described 37 . The substrate was incubated with recombinant Cdk12/CycK or Cdk9/CycT1 for 60 min (Fig. 5a). In addition to comparing Cdk12 to Cdk9, we also compared two different variants of each kinase, which differed in the length of the C-terminal extension appended to the catalytic domain. Whereas Cdk9 showed a strong preference for Ser5 phosphorylation followed by Ser7 but not Ser2, Cdk12 showed a preference for Ser5 but could also catalyse Ser7 phosphorylation, albeit with much lower catalytic activity than Cdk9. Truncation of the kinase length from residue 1082 to 1044 in Cdk12 led to a reduction in activity. No activity was seen towards Thr4, however, for any kinase. A titration from 0.5 to 2.0 mM Cdk12/CycK showed increasing phosphorylations of Ser5 with a faint band observed for Ser7 phosphorylation at high concentrations (Fig. 5b) (lane 5) showed a significantly enhanced Ser5 phosphorylation by the activated kinase (Fig. 5c). Quantitative ELISA data for Cdk12 and Cdk9 kinase activity using a tandem-repeat consensus CTD peptide revealed highest efficacy towards Ser5 phosphorylation for Cdk12 that decreased upon truncation of the kinase C-terminal HE-motif and polybasic region (Fig. 5d). Similarly, Cdk9 showed Ser5 phosphorylation activity but also phosphorylation of Ser7 and Ser2 after 60-min incubation. The latter results were confirmed by time course experiments of P-TEFb that underlined the restriction in the recognition specificity of monoclonal antibodies against multiplyphosphorylated CTD epitopes (Supplementary Fig. 10).
To compare the results of recombinant Cdk12 kinase domains with native full-length protein, we first expressed and purified human full-length Cdk12 as a GST fusion protein in baculoinfected insect cells using the same strategy with CycK and Cak1 coexpression as before. The protein complex was expressed and purified in triplicate, and the identity of Cdk12 confirmed by peptide mass finger print analysis (Supplementary Fig. 11a). In the kinase assay with immunoprecipitated Rpb1, no significant increase in CTD phosphorylation could be observed suggesting the lack of kinase activation signals ( Supplementary Fig. 11b). We therefore overexpressed flag-tagged full-length Cdk12 and Cdk9 as well as the kinase dead variants Cdk12 D877N and Cdk9 D167N in HCT116 cells, respectively, and compared their activity with recombinant Cdk12/CycK. Incubation of GST-CTD with anti-flag immunoprecipitated Cdk12 in an in vitro kinase assays revealed the strongest signal for Ser5 phosphorylation closely followed by Ser2 phosphorylation, while no pSer7 mark was seen (Fig. 5e). The empty vector control and the kinase dead variants showed no phosphorylations of the CTD. Importantly, the recombinant Cdk12 (696-1,082)/CycK (1-267) protein complex exhibited the same phosphorylation specificity as the flag-tagged native complex. Cdk9/CycT1 showed by far the highest activity towards Ser5 phosphorylation followed by pSer7 and pSer2 marks. Input controls of Cdk12/CycK and Cdk9/CycT1 by western blot analyses confirmed the integrity of Cdk-cyclin complexes and equal amounts of kinases in the in vitro kinase assays (Fig. 5f). Together, these data suggest Cdk12 is a promiscuous CTD kinase that has a preference for Ser5 phosphorylations in the in vitro kinase assays.
Inhibition of Cdk12 compared with Cdk9. We tested a series of kinase inhibitors for their ability to inhibit Cdk12/CycK or P-TEFb. To a concentration of 0.5 mM Cdk12/CycK a 10-or 100fold excess of inhibitors was added, using 1 mM ATP and 100 mM pS7-CTD [3] substrates. Whereas some compounds such as DRB, Genistein, RO-3306 and Apigenin showed almost no inhibitory effect on Cdk12, Purvalanol A and B, and CR8 reduced Cdk12 activity nearly 10-fold at 100-fold excess concentration (Fig. 6a). Roscovitine, Staurosporine and CVT-313 showed more modest activity towards Cdk12. The highest efficacy, however, was Peptides contained three consensus hepta-repeats with either no modification (cons. CTD [3] ) or with phosphorylation marks continuously set at one residue of the heptad sequence as shown for Ser7 (pS7-CTD [3] ). The same design principle was used for CTD peptides phosphorylated at Tyr1 (pY1-CTD [3] ), Ser2 (pS2-CTD [3] ), Thr4 (pT4-CTD [3] ) and Ser5 (pS5-CTD [3] ) or a peptide that contained a lysine at position 7 (K7-CTD [3] ). (b) Activity of Cdk12/CycK for various CTD substrates. Of the seven peptides tested, Cdk12 showed the highest activity for continuous Ser7 prephosphorylation. Minor activities were detected for Ser2 and Thr4 prephosphorylated peptides. Remarkably, no activity was seen for cons. CTD [3] or the K7-peptide, which is in contrast to P-TEFb. In addition, Cdk12 exhibited no activity on the pSer5 peptide. (c) ESI-MS analyses of CTD peptides before and after 4 h of incubation with 2 mM Cdk12/CycK. Whereas no phosphorylation occurred for the consensus peptide, the pS7-CTD [3] substrate got readily phosphorylated up to three times. The pS7-CTD substrate instead was readily phosphorylated followed by the S2A-pS7-CTD peptide. The S5A-pS7-CTD peptide exhibited almost no susceptibility as a kinase substrate, suggesting that Cdk12 phosphorylates Ser5 in the context of Ser7 prephosphorylation. Cartoons displaying the peptide templates are shown right. (b) Priming and directionality of Cdk12 phosphorylation. Serine 2, 5 and 7 phosphorylation marks were set either at the N terminus or at the C terminus of triple hepta-repeats. While the activity of Cdk12 towards these peptides was overall very weak, the substrate with the pS7 mark set at the C terminus gained highest recognition. (c) In contrast, Cdk9 exhibits high activity towards C-terminally phosphorylated peptides at either position 5 or position 7, suggesting the priming of the kinase by these prephosphorylations. All data are reported as the mean ± s.d. from three independent experiments.
achieved with flavopiridol. Flavopiridol is known as a potent ATP competitive kinase inhibitor for Cdk9 that is now in clinical trials 25,38 .
In contrast, inhibition of P-TEFb was more potent for all of the inhibitors tested (Fig. 6b)    towards P-TEFb, whereas only Genistein and RO-3306 were found to be weak inhibitors. Staurosporine and CR8 were of higher potency against Cdk9, but were still 410-fold less potent than flavopiridol, which inhibited P-TEFb activity by B20-fold even at the lower dose. Because flavopiridol was found to be the best inhibitor against both Cdk9 and Cdk12, we determined comparative in vitro IC 50 values at 0.2 mM kinase concentration (Fig. 6c). Although the potency of flavopiridol towards Cdk12/ CycK was about one order of magnitude weaker than that for Cdk9, the data suggest that flavopiridol is a specific inhibitor of the CDKs active in transcriptional elongation.

Discussion
The structure of Cdk12/CycK contains a C-terminal extension outside the canonical kinase lobes that appears unique for CTD kinases implicated in transcription elongation and whose truncation correlates with a loss of activity. Conformational transitions are essential elements of the regulation and reaction mechanism of protein kinases, and the closing together of the Nand C-terminal lobes is a characteristic structural feature of protein kinase activation 39,40 . Several kinases have been reported in the past to contain C-terminal extensions that are responsible for their regulation. The calcium/calmodulin-dependent protein kinase I, for example, contains a C-terminal helix-loop-helix segment that interacts with both kinase lobes and keeps the catalytic centre in an autoinhibited state 41 . For Cdk12, the function of the HE motif when contacting the ATP substrate could be similar to protein kinase A, whose C-terminal tail undergoes a rearrangement to stabilize the kinase conformation during the catalytic cycle 31,33 . The kinetic mechanism of substrate binding and product release, however, is complicated by the multiple phosphorylation sites in a single chain of the CTD. A crystal structure containing the C-terminal tail sequence of Cdk9 was recently reported, suggesting an ordered mechanism in which the phosphorylated protein is the first product to leave the enzyme active site 42 . The contribution of the polybasic motif identified here to the enzymatic activity of Cdk12 supports such mechanism for the association of the negatively charged CTD substrate. The substrate specificity of Cdk12/CycK for CTD serine phosphorylations was analysed in different kinase assays, using either the recombinant kinase domain in combination with the cyclin box domain of CycK purified to homogeneity, full-length recombinant Cdk12 or immunoprecipitated full-length Cdk12 expressed from human cells. Similarly, multiple substrates were used such as synthetic CTD peptides up to a length of three hepta-repeats that were modified with prephosphorylation signatures or site-specific alanine mutations, a full-length GST-CTD construct containing all 52 human repeats expressed from E. coli or a precipitated form of human Rpb1 from HeLa cells with the CTD in the hypophosphorylated IIA state. All assays show in common that Cdk12 predominantly phosphorylates Ser5 of the CTD, determined either by mutagenesis experiments or by western blot analysis. However, while the use of Rpb1 showed a faint band for Ser7 phosphorylation but none for Ser2 phosphorylation, flag-tagged full-length Cdk12 exhibited also Ser2 phosphorylation in the GST-CTD substrate but not Ser7 phosphorylation. The full-length recombinant Cdk12 kinase expressed from insect cells instead showed no kinase activity, indicating either the loss of activation signals or autoregulatory motifs in regions adjacent to the kinase domain. Together, these data suggest that Cdk12 is a promiscuous CTD kinase that has a preference for Ser5 phosphorylations. It should, however, be noted that co-factors of Cdk12/CycK in cells could change its catalytic activity or even modulate its substrate phosphorylation profile. Experiments using either recombinant or immunoprecipitated P-TEFb confirm Cdk9 as a Ser5 CTD kinase as recently described 34 . A similar observation has been made by in vivo live imaging experiments of RNA Pol II transcription factories in primary cells, where Cdk9 foci colocalized with pSer5 but not pSer2 marks 43 . Chromatin immunoprecipitation assays coupled with deep sequencing (ChIP-seq) showed that the genome-wide occupancy of Cdk9 is similar to RNA Pol II Ser5 phosphorylation, with the highest enrichment in the 5 0 -end of genes 43 . The various substrate specificities are partly contained in the electrostatic surface characteristics of the Cdk-cyclin complexes that allow association with the negatively charged CTD substrate when phosphorylated (or primed) at neighbouring repeats. Different basic surface patches are indeed seen for the catalytic sites of Cdk12/CycK compared with Cdk9/CycT1 that could account for the kinase phosphorylation activities (Fig. 7). The cell cycle regulating Cdk2/CycA complex instead exhibits a largely acidic surface at the kinase active site, underlining its preference for the SPx(K/R) and RxL substrate recognition motifs 26 .
It is surprising that in the in vitro kinase assays Cdk12 exhibits low activity towards a CTD that is not prephosphorylated at Ser7. This could indicate an additional layer of regulation as Cdk12 is only poorly able to phosphorylate a CTD in the preinitiation state but rather requires other kinases to place phosphorylation marks before it properly recognizes the CTD as a substrate. There are three main differences in the activity and substrate specificity of Cdk12 compared with those of Cdk9: (i) Cdk12 requires a preset pSer7 mark for full activity on the CTD, whereas Cdk9 readily phosphorylates the consensus CTD as well as the pSer7 prephosphorylated CTD; (ii) Cdk12 does not phosphorylate a repeat containing Lys7, whereas Cdk9 does; and (iii) Cdk12 appears to place only one phosphorylation mark per pSer7 mark, whereas Cdk9 is able to phosphorylate a CTD substrate at a stoichiometry equalling the number of CTD repeats provided. A preset pSer7 or pSer5 mark particularly in a C-terminal repeat of a CTD substrate stimulates Cdk9 activity, whereas Cdk12 seems to be able to place only one phosphorylation N terminally to the pSer7 mark. The state of premodifications in CTD substrates could be indeed another source of ambiguity between in vitro and in vivo analyses that is, however, inherently linked to the periodic hepta-repeat structure of the substrate.
The misregulation of transcriptional regulation is increasingly recognized as a cause of a broad range of diseases 44 and changes in the Cdk12 gene, and its expression might predispose to human disease. Mutations of Cdk12 have been identified in genomic analyses of ovarian, breast and lung carcinoma and melanoma [45][46][47] . Several of these mutations align with the kinase domain of the 164 kDa protein, suggesting the catalytic activity and interaction with CycK are crucial for cell viability. The structure of human Cdk12 with its C-terminal helix associating with the ATP substrate creates the basis for sitedirected development of specific inhibitors, for example, the dynamics of the C-terminal extension could be confined by displacing water molecules from this unique interface to the ATP cosubstrate.
The decipherment of the CTD code requires the understanding of the setting of modifications, their recognition and their removal in a spatial and temporal manner, to link these transient modifications to the transcription cycle of various genes 48,49 . The determination of different substrate specificities and enzymatic activities in the two mammalian kinases Cdk9 and Cdk12 is a first step to recognize their contributions to the variability in CTD phosphorylations. The identification of Cdk12/CycK regulating factors will be a next step to understand its function in the regulation of transcriptional elongation.

Methods
Plasmids and proteins. Expression and purification of Cdk12/CycK protein complexes were carried out in insect cells using the MultiBac Turbo system 50 . Synthetic genes comprising the kinase domain of human Cdk12 (UniProt accession number Q9NYV4; residues 696-1,082) and the cyclin box domain of human CycK (O75909; residues 1-300) codon optimized for expression in Trichoplusia ni were synthesized by GeneArt (Regensburg). Cdk12 was cloned into a modified pACEBac1 acceptor vector (ATG:biosynthetics) including an N-terminal GST affinity tag followed by a TEV (tobacco etch virus protease) cleavage site. CycK was similarly cloned with a TEV-cleavable N-terminal GST affinity tag into the pIDK donor vector. Full-length human Cdk12 was cloned from a cDNA clone (GenBank accession number BC150265.1) purchased from imaGenes (Source BioScience) with restriction sites EcoR1 and Not1 and inserted with an N-terminal GST tag followed by the TEV site into the pACEBac1 expression vector. Full-length CAK1 (P43568) from S. cerevisiae was cloned into the pIDC donor vector without any affinity tag. All plasmids were confirmed by DNA sequencing before expression.
Vectors were fused by in vitro Cre recombination and applied to Tn7dependent integration into the baculoviral genome of DH10 MultiBac Turbo cells (ATG:biosynthetics). Recombinant bacmid DNA was isolated and then used to transfect Sf21 insect cells. Liquid culture of Sf21 cells was maintained at 27°C in SF-900 III SFM medium (Invitrogen) shaking at 100 r.p.m. Initial recombinant baculoviruses were amplified in Sf21 cells and used for expression by infecting cells at a density of 1.5 Â 10 6 cells per ml by addition of 2% (v/v) of virus stock V2 (multiplicity of infection (MOI)41). After 72-96 h, cells were collected by centrifugation, washed in PBS and pellets stored at À 80°C.
For large-scale purification of CDK12/CycK constructs, cells were resuspended in lysis buffer (50 mM Hepes pH 7.6, 500 mM NaCl, 10% glycerol and 1 mM DTT) and disrupted by sonication. The lysate was cleared by centrifugation in a Beckman Optima L-80 XP Ultracentrifuge with a Ti45 rotor (45,000 r.p.m. for 45 min at 4°C) and applied to GST Trap FF columns (GE Healthcare) equilibrated with Lysis buffer using an Ä kta Prime chromatography system (GE Healthcare). Following extensive washes with 10 column volumes (CV) of Lysis buffer and 5 CV of wash buffer (50 mM Hepes pH 7.4-8.2, 1,000 mM NaCl, 10% glycerol and 1 mM DTT), the protein was eluted in elution buffer (50 mM Hepes pH 7.4-8.2, 500 mM NaCl, 10% glycerol and 1 mM DTT, 10 mM Glutathione).
Cleavage of the GST tag was achieved by adding TEV protease in a 1/20 ratio and was performed for 20 h at 4°C. Protein solution was concentrated and loaded on a preparative HiLoad 16/60 Superdex 200 prep grade gel filtration column (GE Healthcare) equilibrated in GF buffer (20 mM Hepes pH 7.4-8.2, 400 mM NaCl, 5% glycerol and 2 mM TCEP). Fractions of the main peak containing pure Cdk12/ CycK complex as determined by SDS-PAGE were pooled and concentrated. The protein was aliquoted, snap frozen in liquid nitrogen and stored at À 80°C. Cdk9/ CycT1 as well as human full-length GST-CTD proteins were prepared similarly as described 34 .
Crystallization and structure determination. For crystallization, the purified Cdk12/CycK complex was mixed at 85 mM concentration with ADP, AlF 3 , MgCl 2 and substrate peptide P-pS-YSPTSP-pS-YSPT in molar ratios of 1:8:32:64:8 and incubated on ice for 30 min. Initial crystals were obtained using the hanging drop vapour diffusion technique at 293 K by mixing 1 ml protein solution with 1 ml of the reservoir solution containing 0.1 M Bis-Tris, pH 6.5, 25% PEG 3350 and 0.325 M MgCl 2 . Crystals grew as clusters that showed high mosaicity while testing on the diffractometer. Micro-seeding ('Beads for Seeds' from Jena Biosciences) was used to obtain large single crystals. The seed stock was obtained by transferring an entire drop with crystals to a microcentrifuge tube containing glass beads and 100 ml of stabilization solution (0.1 M Bis-Tris, pH 6.5, 20.5% PEG 3350 and 0.4 M MgCl 2 ). The seed stock was vortexed for 2 min and serial dilutions were prepared.
Crystallization drops were set up by mixing protein sample and seed stock dilutions in a 1:1 ratio and crystals were grown using the hanging drop vapour diffusion technique at 293 K. Best crystals grew within 2 weeks to a size of about approximately 200 Â 30 Â 30 mm 3 using a 10 À 2 dilution of the seed stock. For cryo-protection, crystals were transferred to a solution that contained the stabilizing agents with additional 0.4 mM substrate peptide and 15% ethylene glycol. After 5-10 s soaking, crystals were flash frozen in liquid nitrogen.
Diffraction data were collected at the Swiss Light Source Villigen at 0.9785 Å wavelength and 100 K temperature using the PILATUS 6M detector (oscillation width per frame: 0.25°; 720 and 740 frames collected). The XDS package 51 was used to process, integrate and scale the data. The structure was solved by molecular replacement using the program PHASER 52 Table 1). The peptide used for crystallization (ac-P-pS-YSPTSP-pS-YSPT-amid) contained two phosphorylated serine residues at heptad position 7. For quantitative analysis in ESI-MS experiments or radioactive filter-binding assays, CTD substrate peptides were marked at the C terminus with a double arginine motif separated by a polyethylene glycol spacer for better ionization properties or increased transfer rates, respectively.
In vitro kinase assays. Radioactive kinase reactions (typically 35 ml) were carried out with recombinant highly purified proteins using a standard protocol similarly as described 34  For ELISA, the consensus CTD-peptide (YSPTSPSYSPTSPSC; Peptide Specialty Laboratories GmbH, Heidelberg) was coupled to 96-well maleimide plates for 60 min at 37°C in carbonate buffer at pH 9.5. After washing, the kinase assay was performed using recombinant Cdk12 (1.5 mg) or Cdk9 kinase (0.5 mg) in 25 ml kinase buffer (20 mM Tris/HCl (pH 7.4), 20 M NaCl, 10 mM MgCl 2 , 1 mM DTT, 1 Â PhosSTOP and 2 mM ATP) at 28°C for 60 min followed by washing and blocking with PBS/milk (1%) for 30 min. Primary antibodies were added and incubated for 30 min. After an additional washing and blocking step, biotincoupled secondary antibodies were added for 30 min. Following another washing and blocking step, peroxidase attached to avidin was added to the wells. After washing five times with PBS, 50 ml of substrate buffer (o-phenylenediamine and H 2 O 2 ; pH 5.0) was added. After colour change, OPs of samples were measured at 405 nm in the ELISA reader.
For the kinase assay with the endogenous Pol II as substrate, Pol II was immunepurified from whole HeLa cell extracts with an antibody recognizing Pol IIA with non-phosphorylated CTD (1C7). Five microlitres of the substrate-coupled Sepharose G beads was incubated with 45 ml kinase buffer B (50 mM Hepes (pH 7.9), 100 mM KCl, 10 mM MgCl 2 , 200 mM EGTA, 100 mM EDTA, 1 mM DTT, 200 mM ATP, 1 mg BSA, 1 Â PhosSTOP and 1.5 mg of the recombinant Cdk12 kinase or 0.5 mg of recombinant Cdk9 kinase) at 30°C for 60 min. Laemmli buffer was added (sixfold) and samples were incubated for 5 min at 95°C followed by western blot analysis.
Kinase assays with Flag-tagged proteins. Plasmids pcDNA3.1 3xFlag Cdk12 and pcDNA3.1 3xFlag Cdk9 used for expression of full-length flag-tagged proteins as well as the pcDNA3.1 3xFlag empty vector were described previously 16 . Kinase dead mutants Cdk12 D877N and Cdk9 D167N were obtained by mutagenesis of the corresponding wild-type plasmids and cloned into the pcDNA3.1 3xFlag expression vector.
One 15-cm plate of HCT116 cells was transfected with 20 mg of plasmid using PEI reagent. After 48 h, cells were collected and lysed in lysis buffer (20 mM Hepes/ KOH pH 7.9, 15% glycerol, 0.2% NP-40, 300 mM KCl, 1 mM DTT, 0.2 mM EDTA and protease inhibitor (Sigma)). Flag-tagged proteins were immunoprecipitated from the lysate using 20 ml of flag-agarose (Sigma). The immunoprecipitates were washed three times with 1 ml of the lysis buffer containing 500 mM KCl followed by washing with 1 ml of a detergent-free buffer (20 mM Hepes/KOH pH 7.9, 150 mM KCl, 1 mM DTT, 15% glycerol). The flag-tagged proteins were eluted from the flag-agarose with 40 ml of flag peptide dissolved in 20 mM Hepes pH 7.9, 150 mM KCl, 1 mM DTT. In vitro kinase assays were performed in 20 mM Hepes/ KOH pH 7.9, 5 mM MgCl 2 , 2 mM DTT and 1 mM ATP with either 10 ml of flag eluate or 40 ng of recombinant CycK/Cdk12 and with 300 ng of GST-tagged human full-length CTD as a substrate. Total of 60 ml of kinase reaction was incubated at 30°C for 1 h, and reaction was terminated by adding 60 ml of 2 Â SDS sample buffer.
Immunoprecipitations. A total of 4 Â 10 6 HeLa cells were lysed in 200 ml IP buffer (50 mM Tris-HCl, pH 8.0, 150 mM NaCl, 1% NP-40 (Roche), 1 Â PhosSTOP (Roche), 1 Â protease inhibitor cocktail (Roche)) for 20 min on ice. All samples were sonicated on ice using a BRANSON Sonifier 250 (15 s on, 15 s off, 50% duty) and centrifuged at 14,500 r.p.m. for 15 min at 4°C. The supernatant was incubated with 50 ml antibody-coupled protein G-sepharose beads (2.5 mg of antibody 1C7 for 4 h at 4°C, followed by 2 washes with 1 ml IP buffer) rotating overnight. Beads were washed six times with 1 ml IP buffer and used as substrate for in vitro kinase assay.
Western blots. Samples of protein were collected following treatment using 6 Â Laemmli buffer. Protein was subjected to SDS-PAGE on a 6.5% gel before transfer to nitrocellulose (GE Healthcare). Membranes were stained with affinity-purified, IR-labelled secondary antibodies against rat (680 nm; Alexa, Invitrogen) and mouse (800 nm; Rockford, Biomol), and stained with hrp-conjugated secondary antibodies against rat (Sigma) or mouse (Promega), and revealed by enhanced chemiluminescence.
Mass spectrometry analyses. Peptide and protein masses were determined by liquid chromatography-electrospray ionization-mass spectrometry using an Agilent 1100 chromatography system and an LCQ Advantage MAX (Finnigan) mass spectrometer operating in positive ion mode. Proteins were applied onto an Vydac RP-C4 column (Grace) at 20% buffer B (CH 3 CN with 0.08% trifluoroacetic acid) in buffer A (H 2 O plus 0.1% TFA) and eluted with a gradient from 20-80% buffer B at a flow rate of 1 ml min À 1 . Peptide samples were loaded onto the column at 5% buffer B and eluted with a gradient from 5-80% buffer B. Data evaluation was performed with the Xcalibur, MagTran and Bioworks software packages.