Structures of the CDK12/CycK complex with AMP-PNP reveal a flexible C-terminal kinase extension important for ATP binding

Cyclin-dependent kinase 12 (CDK12) promotes transcriptional elongation by phosphorylation of the RNA polymerase II C-terminal domain (CTD). Structure-function studies show that this activity is dependent on a C-terminal kinase extension, as well as the binding of cyclin K (CycK). To better define these interactions we determined the crystal structure of the human CDK12/CycK complex with and without the kinase extension in the presence of AMP-PNP. The structures revealed novel features for a CDK, including a large β4-β5 loop insertion that contributes to the N-lobe interaction with the cyclin. We also observed two different conformations of the C-terminal kinase extension that effectively open and close the ATP pocket. Most notably, bound AMP-PNP was only observed when trapped in the closed state. Truncation of this C-terminal structure also diminished AMP-PNP binding, as well as the catalytic activity of the CDK12/CycK complex. Further kinetic measurements showed that the full length CDK12/CycK complex was significantly more active than the two crystallised constructs suggesting a critical role for additional domains. Overall, these results demonstrate the intrinsic flexibility of the C-terminal extension in CDK12 and highlight its importance for both ATP binding and kinase activity.


Results
Soluble expression of CDK12 requires cyclin K. Human CDK12 and cyclin K are large multidomain proteins (Fig. 1a). To prepare CDK12 protein for crystallisation trials a number of kinase domain constructs exploring truncations between residues 661 to 1099 were cloned into a modified pFastBac vector and expressed in Sf9 insect cells. Small scale nickel-affinity purifications indicated that none of the hexahistidine-tagged constructs were solubly expressed (Fig. 1b). Cyclin-dependent kinases display inherent plasticity that may be stabilized by the binding of their cognate cyclin. Co-expression was therefore performed with suggested cyclin partners, including cyclin K1 (CycK) and cyclin L1 (CycL1). No constructs of CycL1 were identified that solubly expressed by themselves or when co-expressed with CDK12 protein (Fig. 1b). In contrast, all constructs of CDK12 were solubly expressed when co-expressed with CycK  (cyclin domain corresponding to PDB ID: 2I53) allowing purification of the various CDK12/CycK complexes (Fig. 1b). Mass spectrometry revealed that many of the expressed CDK12 proteins were mono-phosphorylated (Fig. 1c), whereas the shorter construct CDK12 715-1038 was essentially unphosphorylated (Fig. 1d). However, CDK12 715-1038 was phosphorylated in vitro upon incubation with recombinant CAK from Candida albicans (Fig. 1d).
Structure determination. For scale up, comparable expression levels of CDK12 proteins and CycK were achieved by adjusting the ratio of the two viruses used for co-expression. The expressed complexes were purified to homogeneity using nickel affinity and size-exclusion chromatography. Crystals obtained using the non-phosphorylated CDK12 715-1038 protein exhibited multiple lattices and could not be indexed. However, diffraction quality crystals in the monoclinic space group P2 1 were obtained when the same complex was phosphorylated with CAK. The resulting CDK12 715-1038 /CycK  structure was solved by molecular replacement using CDK9 (PDB ID: 4BCG) 32 and CycK (PDB ID: 2I53) 33 as search models and refined at 3.15 Å resolution. Diffraction quality crystals in space group P2 1 (but with different cell dimensions) were also obtained for the larger CDK12 715-1052 /CycK 11-267 complex, which was mono-phosphorylated after expression without further CAK treatment (Fig. 1c). This structure was refined at 3.15 Å resolution and similarly contained two protein complexes in the asymmetric unit (see Table 1 for data collection and refinement statistics for both structures).
The electron density maps for both structures were generally of high quality. In the complex of the longer CDK12 construct, the CDK12 chains were traced between residues Asp718-Gln1050, with the exception of residues Asp798-Ala799 and Ser889 in chain A and residues Arg1049-Gln1050 in chain C, which were excluded from the model. CycK chains B and D were traced between residues Thr20-Met265 and Pro22-Lys262 respectively. In the more truncated structure, the electron density was most complete for the complex comprising chains A and C. CycK (chain A) was modelled between residues Pro22-Gln260 and the bound CDK12 subunit (chain C) between residues Trp719-Pro1031, with the exception of two regions of poor electron density between Gln797-Lys803 and Lys975-Lys976, respectively.
The structures show three distinct protein complexes. The different chains in the CDK12/ CycK structures share a common heterodimeric assembly of their folded domains, but exhibit significant differences in their protein C-termini yielding three distinct protein complexes (Fig. 2a) longer CDK12 715-1052 chains are distinguished by an additional α K helix that was also observed by Bösken et al. 22 However, the current structures reveal hitherto unseen conformational plasticity in this C-terminal extension, which packs alongside the ATP pocket in chain A, but at the back of the kinase domain in chain C (Fig. 2a). Strikingly, the ligand AMP-PNP, included in all crystallisations, is bound (c) Deconvoluted intact mass spectra for CDK12 715-1052 purified following co-expression with CycK 11-267 . The two mass peaks represent the native protein as well as a larger species containing a single phosphorylation on CDK12 Thr893. (d) Deconvoluted intact mass spectra for CDK12 715-1038 obtained before and after treatment with recombinant CAK from Candida albicans. The shorter CDK12 construct is essentially unphosphorylated until CAK treatment. only in chain A where its interactions are stabilized by the presence of the α K helix (Fig. 2a). Similarly, there is very poor or no electron density for AMP-PNP in the structure of the shorter CDK12 715-1038 construct, further highlighting the importance of the α K extension.
Perhaps as a result of these varied features, other subtle differences are also observed across the available CDK12 structures. Overall, the nucleotide-bound chains show a more closed conformation of the ATP pocket due to the tighter packing of their glycine-rich loop (Fig. 2b). Closer packing of the α C helix is also observed in the CDK12 715-1052 chains resulting in a subtle twist in their N-terminal lobe (Fig. 2b) and a small shift in the position of the bound cyclin (Fig. 2a). The structure of the CycK appears rigid and is essentially unchanged from the unbound protein 33 . Differences are observed only for the N and C-termini as well as a flexible region centred on the H4' helix in the second cyclin box (Fig. 2c).
Structural features of the active CDK12 kinase domain. The CDK12 chains display an active conformation of the kinase domain consistent with their cyclin interaction and their phosphorylation at the activation loop residue Thr893 (Fig. 3a). The correct positioning of the α C helix in the AMP-PNP-bound structure is confirmed by the canonical salt bridge formed between the catalytic residues Lys756 (β 3) and Glu774 (α C) (not shown). A hydrophobic regulatory spine is also established across the N and C-terminal lobes by CDK12 residues Leu778, Met789, His857 and Phe878 (Fig. 3b). Phosphorylation on Thr893 additionally stabilizes both the activation and catalytic loops through hydrogen bond interactions with Arg882 and Arg858, respectively (Fig. 3c). Arg882 is one of eight deleterious CDK12 mutations identified in ovarian cancer (Fig. 3d) 28 . Its mutation to leucine significantly impairs kinase activity 29 and is predicted to break critical interactions between phospho-Thr893 and the activation loop.
Interactions in the CDK12/CycK interface. Overall, the binding of CDK12 and CycK fits the model of the transcriptional CDKs first established by the structure of the CDK9/CycT1 complex 34 . As expected for this CDK class, interactions in the protein-protein interface are limited to the kinase N-lobe and the first cyclin box motif (Fig. 2a). The new structures reveal a large β 4-β 5 loop, which forms a notable insertion in CDK12 and contributes additionally to the overall binding surface area (Fig. 4a). The β 4-β 5 loop sits atop CycK where it inserts CDK12 Phe802 into a nest of hydrophobic residues, including Val142, Val143 and Ile146 from CycK H5. This packing is further stabilized by the intervening CycK residue Arg145, which adopts two alternative rotamers in the asymmetric unit to hydrogen bond with CDK12 Thr794 or Gly807, respectively (Fig. 4a). Such heterogeneity likely reflects the intrinsic flexibility of the β 4-β 5 loop, which has few additional interactions except for crystal packing contacts. Below the β 4-β 5 loop the CDK12 'PITAIRE' helix (α C) packs between CycK helices H3 and H5 (Fig. 4b).
Here, the core of the protein-protein interface is hydrophobic with contributions from the kinase N-terminus (Trp719) as well as the α C and surrounding β -sheet ( Fig. 4b). Beyond the hydrophobic core there are a number of electrostatic interactions that make up the periphery of the binding interface, including a salt bridge between CDK12 Arg773 (α C) and CycK Glu108 (H3) (Fig. 4c).
Conformational plasticity of the C-terminal kinase extension. An unexpected feature of the CDK12 715-1052 /CycK 11-267 complex structure is the conformational plasticity of the C-terminal kinase extension. The conformation of the two CDK12 molecules in the asymmetric unit diverges following Leu1025, which packs against the α E helix at the back of the kinase domain (Fig. 5a). The AMP-PNP-bound complex adopts a similar conformation to that reported previously by Bösken et al. 22 In these structures, the C-terminal kinase extension wraps around the front of the ATP pocket where it runs parallel to the β 6 strand before turning away in a perpendicular direction (Fig. 5a). By contrast, the C-terminus in the second CDK12 complex continues a path across the back of the kinase domain to pack within 4.3 Å of the bound CycK subunit (Fig. 5a). Interestingly, the α K helix, encompassing residues 1040-HELWS-1044, is formed in both CDK12 conformations suggesting that this secondary structural element is stably folded (Fig. 5b). The C-terminal kinase extension has been shown to be critical for CDK12 activity 22 . When engaged by the ATP pocket this element acts to enclose the bound AMP-PNP molecule adding significantly to the pocket surface area (Fig. 5c). As well as contacts by His1040 and Glu1041 to the AMP-PNP moiety, there are stabilizing interactions with the kinase hinge region, including a hydrogen bond between Asp1038 and the hinge residue Tyr815 (Fig. 5d). Van de Waals interactions with the N-lobe β 1 strand are additionally established by both His1040 and Leu1042. Finally, there are hydrophobic and electrostatic interactions with the C-lobe, including a hydrogen bond between the main chain amide of Glu1041 and side chain of Asp819. The current work also extends the visible structure to the polybasic tail region comprising 1045-KKRRRQ-1050, which forms a putative flexible interaction site for capture of the CTD substrate. Full length CDK12/CycK is required for maximal activity. In vitro kinase assays using a GST-CTD substrate were performed to compare the activities of the crystallised CDK12/CycK complexes against the full length proteins. The activity of the full length CDK12/CycK1 complex was comparable to that of the equivalent CDK9/CycT1 complex (Fig. 6). By contrast, the complex comprising CDK12 715-1052 exhibited a 10-fold reduction in activity, while the activity of the CDK12 715-1038 truncation was severely diminished (Fig. 6). Additional kinetic analyses were performed to further understand the effects of the CDK12 truncation (Fig. 7). The determined Km values for ATP were in the typical range for protein kinases. However, the full length CDK12/CycK1 complex exhibited a Km ATP value of 2 μ M that was notably improved relative to the CDK12 715-1052 /CycK 11-267 complex, which had a Km ATP of 25 μ M (Fig. 7a,b). Similar differences in respective binding were observed for the CTD substrate. Whereas the full length complex showed a Km CTD value of 0.3 μ M, the binding of the truncated CDK12 715-1052 complex was poorer with a Km CTD value of 2 μ M (Fig. 7c-d). These differences suggest that other domains within the full length proteins make important contributions to the substrate interactions.

Discussion
The structures provide three different snapshots of the CDK12/CycK complex in its phosphorylated active configuration. The most significant conformational changes occur in the C-terminal kinase extension, which features a large flexible linker followed by the unusual α K helix. Plasticity in this region is consistent with earlier observations for CDK9 35 . This CTD-directed kinase contains a C-terminal tail that folds similarly across the ATP-binding site, although it displays a distinct α K position. Flexibility in this tail has been suggested to facilitate the successive opening and closing of the ATP pocket to allow for cycles of ADP release and ATP capture 35 . Indeed, AMP-PNP was stably bound in our CDK12 structures only when trapped by the α K, while the capacity for this helix to transiently dissociate was illustrated by the folding of the other CDK12 chains. In the current crystal form, the dissociated linker and α K were folded across the back of the kinase domain in a manner reminiscent of the MAP kinases 36 . However, the limited number of specific contacts suggests that this region may be free in solution to explore multiple conformations. It would be of interest to investigate these motions in future by molecular dynamics simulations.
The CDK12 kinase acts late in the transcriptional cycle and is expected to engage a negatively charged CTD substrate 22 . The final C-terminal positions in the CDK12 715-1052 structure form a polybasic cluster that is also loosely conserved in CDK9. The folding of the C-terminal extension across the ATP pocket is therefore also hypothesized as an adaptation to support the recruitment of the CTD substrate by provision of a complementary charged surface 22,35 . Dynamic movement of the C-terminus may also promote recruitment by increasing the potential search space for protein-protein interactions. Further biochemical and structural analyses are required to validate these hypotheses.
It is also striking that our truncated CDK12/CycK complexes are significantly less active than the full length proteins. Thus, there must be additional roles for other domains and likely also for other proteins in the cell. Notably, CDK12 contains multiple arginine-serine-rich (RS) motifs that may facilitate the recruitment of RNA processing factors for the coupling of transcription and RNA splicing. Proline-rich motifs (PRMs) are also identified in both CDK12 and CycK that may form recognition sites for SH3 or WW domain adaptors. It may be expected that a multiprotein CDK12/CycK complex will remain closely associated with the CTD to allow for the successive phosphorylation of multiple heptad repeats.
The CDK12/CycK structures also reveal further details of this most critical protein-protein interaction. The cyclin binding interface of the transcriptional CDKs is notably smaller than those regulating the cell cycle. We observe that this loss is offset slightly in CDK12 by an unusual β 4-β 5 loop insertion that packs atop the H5 helix of the first cyclin box motif. Moreover, this loop is conserved in the CDK13 kinase domain, which also binds to the CycK protein. However, a key interaction missing in the transcriptional CDK class is a contact between the cyclin and the kinase activation segment. Indeed, the cyclin interaction in these kinases is restricted to the kinase N-lobe. Structural studies of the CDK9/CycT1 complex have revealed that these interactions are instead mediated by other accessory proteins that direct kinase activity, such as HIV-1 Tat 37 . In this quaternary complex, the Tat protein contacts both the CDK9 and CycT1 subunits and modulates their protein-protein interaction. The kinase activation segment in CDK12 is similarly exposed and available to bind other partners. It will be interesting to discover if any such factors are identified that parallel the CDK9 interaction of HIV-1 Tat.
Overall, this work extends our understanding of the structural mechanisms that determine the activity of the CDK12/CycK complex. Further, it emphasizes the importance of protein flexibility as well as the contribution of regulatory elements outside the core catalytic domains. Finally, the packing of the α K helix offers a novel ATP pocket environment for the design of specific inhibitors targeting this important kinase in the DNA-damage response.

Methods
Cloning. For structural studies, various constructs of human CDK12 (Uniprot Q9NYV4), human CycK (Uniprot O75909) and human CycL1 (Uniprot Q9UK58) were cloned into the baculoviral transfer vector pFB-LIC-Bse by ligation-independent cloning. The vector encodes an N-terminal hexahistidine tag and a Tobacco Etch Virus Protease A (TEV) cleavage site. Bacmid DNA was prepared from Escherichia coli strain DH10Bac and used to generate baculovirus in Sf9 insect cells. Full length CDK12 Isoform 1 (NM_016507.2) and CycK1 15 were cloned into pDONR221 and recombined into bacmids using the Invitrogen Baculovirus Expression System with Gateway Technology. The full length proteins were N-terminally tagged with GST or 6xHis epitopes prior to bacmid generation as per manufacturer's conditions (Invitrogen pDEST10 and pDEST20 vectors).
Protein expression and purification. Baculovirus for different CDK12 and cyclin constructs were used to co-infect Sf9 cells grown in suspension to a density of 2 × 10 6 cells/mL. For small scale testing, 3 mL cultures were grown using a 24-well block and harvested 72 hours post-infection. Cells were lysed by sonication in binding buffer (50 mM HEPES pH 7.5, 500 mM NaCl, 5% glycerol, 5 mM Imidazole,  0.5 mM TCEP) supplemented with protease inhibitor cocktail set III (Calbiochem) and the lysate clarified by centrifugation. The expressed proteins were purified using nickel-sepharose resin (GE Healthcare) and analysed by SDS PAGE. Proteins for crystallisation were expressed similarly at the 1L scale. After sonication, polyethylenimine was added to a final concentration of 0.15% to precipitate DNA and the cell lysate clarified by centrifugation at 21,000 RPM for 1 hour at 4 °C. CDK12/CycK complexes were purified initially using nickel-sepharose chromatography (GE Healthcare) and eluted stepwise with imidazole before tag cleavage with TEV protease. Complexes containing CDK12 715-1038 were additionally phosphorylated using recombinant Cak1 from Candida albicans. For final clean up, proteins underwent reverse nickel-affinity purification as well as size exclusion chromatography using a HiLoad Superdex S75 26/60 column (GE Healthcare) buffered in 50 mM HEPES pH 7.5, 300 mM NaCl and 1 mM TCEP.
For expression of the full length CDK12/CycK1 complex, baculoviruses for the epitope-tagged CDK12 and CycK1 were co-infected at a ratio of 4:1 and incubated at 28 °C with shaking at 150 rpm. Cells were harvested 48-72 hours post infection. The full length protein complexes were purified by tandem affinity chromatography using either combinations of 6xHis-CDK12/GST-CycK1 or GST-CDK12/6xHis-CycK1. Cell pellets were resuspended in CDK lysis buffer (10 mM Tris-HCl pH 7.5, 10 mM NaCl, 2 mM β -mercaptoethanol, 0.5 mM EDTA, 10 mM β -glycerolphosphate, 0.5 mM sodium orthovanadate, 2 mM NaF, 0.2% v/v NP-40 and EDTA-free complete protease inhibitor cocktail (Roche)) 38 . The lysate was incubated on ice for 30 minutes with an additional 0.5 M NaCl and occasional manual mixing. Lysate was then subjected to sonication and clarified by centrifugation. Proteins were captured overnight at 4 °C using Ni-NTA agarose (Qiagen) that was pre-equilibrated with CDK equilibrium buffer (10 mM Tris-HCl pH 7.6, 500 mM NaCl, 10% glycerol, EDTA-free complete protease inhibitor cocktail). The Ni-NTA agarose was washed 3× with CDK equilibration buffer and the bead slurry transferred to a disposable column for step-wise elution using CDK equilibration buffer supplemented with 15, 25, 100 and 200 mM imidazole. The eluted protein complexes were dialyzed overnight against CDK activation buffer (12.5 mM Tris-HCl pH 7.5, 150 mM NaCl, 10 mM MgCl 2 , 1 mM EGTA, 5 mM β -glycerolphosphate, 0.5 mM sodium orthovanadate, 2 mM DTT, 0.01% Triton X-100, 10% glycerol, EDTA-free complete protease inhibitor cocktail) to facilitate buffer exchange and removal of imidazole. The protein was concentrated using an Amicon filtration device with a 30 KDa molecular weight cut-off and the retentate was incubated with 500 μ M ATP at 30 °C for 1 hour to allow auto-activation of the CDK12 kinase. Subsequently, the proteins were further purified using glutathione agarose beads (Pierce). Following overnight batch binding at 4 °C, the beads were washed 4x with CDK activation buffer and the bead slurry transferred to a disposable column. Bound protein was eluted in a step-wise fashion with elution buffers (100 mM Tris-HCl pH 7.5, 300 mM NaCl, 1.0 mM EDTA, 0.04% Triton X-100, 4 mM DTT) supplemented with 10 mM and 20 mM glutathione. The purity of the eluted fractions was confirmed by SDS-PAGE or Western blotting before storage at − 80 °C in 50 mM Tris pH 7.5, 150 mM NaCl, 0.5 mM EDTA, 0.02% Triton X-100, 2 mM DTT, 50% glycerol. Protein concentration was determined by Bradford assay.
Kinase assays. Kinase assays were performed as described previously 15 . In brief, purified CDK complexes were incubated in kinase assay buffer (50 mM Tris pH 7.5, 100 mM NaCl, 10 mM MgCl 2 , 0.1% v/v NP-40, 1 mM DTT, 20 μ M β -glycerophosphate, EDTA-free complete protease inhibitor cocktail) with varying amounts of cold ATP/ [γ -32 P]ATP and GST-CTD substrate which contains all 52 heptad repeats of human RNA Pol II C-terminal domain 15 . Reactions were incubated in a circulating 30 °C water bath for 15, 30 or 60 minutes. Kinase reactions were stopped with the addition of 6× SDS-PAGE loading dye. Samples were heated at 85 °C for 5 minutes and resolved by 4-12% SDS-PAGE. Gels were subsequently dried on 3 MM Whatmann paper and imaged with a FujiFilm FLA-7000 scanner. Phosphorylated bands were quantified using FujiFilm MultiGauge ™ software.
Mass spectrometry. Protein masses were determined using an Agilent LC/MSD TOF system with reversed-phase high-performance liquid chromatography coupled to electrospray ionization and an orthogonal time-of-flight mass analyser. Proteins were desalted prior to mass spectrometry by rapid elution off a C3 column with a gradient of 5-95% isopropanol in water with 0.1% formic acid. Spectra were analysed using the MassHunter software (Agilent).
Crystallisation. The CDK12 715-1038 /cyclin K 11-267 complex was buffered in 50 mM HEPES pH 7.5, 500 mM NaCl, 5% glycerol, 5 mM imidazole, 0.5 mM TCEP, 10 mM DTT and concentrated to 4.3 mg/mL. The non-hydrolyzable ATP analogue, adenylyl imidodiphosphate (AMP-PNP) was added to a final concentration of 1 mM. Crystals were grown at 4 °C in 150 nL sitting drops mixing 100 nL protein solution with 50 nL of a reservoir solution comprising 20% PEG3350, 0.1 M Bis-tris propane pH 6.5, 0.2 M sodium nitrate, 10% ethylene glycol and 1 mM MgCl 2 . Before mounting, crystals were cryo-protected with mother liquor supplemented with an additional 15% ethylene glycol and vitrified in liquid nitrogen. The CDK12 715-1052 /cyclin K 11-267 complex was buffered in 50 mM HEPES pH 7.5, 300 mM NaCl, 1 mM TCEP, 10 mM DTT, 5 mM L-arginine, 5 mM L-glutamate and concentrated to 6 mg/mL. The ATP analogue AMP-PNP was added to a final concentration of 1 mM together with 5 mM MgCl 2 . Crystals were grown at 20 °C in 150 nL sitting drops mixing 50 nL protein solution with 100 nL of a reservoir solution comprising 20% PEG3350, 150 mM DL-malic acid. Before mounting, the crystals were cryo-protected with mother liquor supplemented with an additional 15% ethylene glycol, 3 mM AMP-PNP, 5 mM MgCl 2 and vitrified in liquid nitrogen. Structure determination. Diffraction data were collected at 100 K on Diamond Light Source beamline I24 for the CDK12 715-1038 /cyclin K 11-267 complex and beamline I02 for the CDK12 715-1052 /cyclin K  complex. Data were indexed and integrated using XDS 39 and scaled using AIMLESS 40 in the CCP4 suite of programs 41 . Phases were found using molecular replacement in PHASER 42 and search models generated by CHAINSAW 43 . The structures of CDK9 (PDB ID: 4BCG) 32 and cyclin K (PDB ID: 2I53) 33 were used as search models for the CDK12 715-1038 /cyclin K 11-267 complex. The resulting CDK12 and cyclin K structures were then used as search models for the CDK12 715-1052 /cyclin K 11-267 complex. Models were built initially using COOT 44 and then refined and modified using alternate rounds of REFMAC5 45 and COOT, with the later rounds of refinement in PHENIX 46 . The refined structures were validated with MolProbity 47 and the atomic coordinate files deposited in the Protein Data Bank. Structure figures were prepared with PyMOL 48 .