Reading and erasing of the phosphonium analogue of trimethyllysine by epigenetic proteins

Nε-Methylation of lysine residues in histones plays an essential role in the regulation of eukaryotic transcription. The ‘highest’ methylation mark, Nε-trimethyllysine, is specifically recognised by Nε-trimethyllysine binding ‘reader’ domains, and undergoes demethylation, as catalysed by 2-oxoglutarate dependent JmjC oxygenases. We report studies on the recognition of the closest positively charged Nε-trimethyllysine analogue, i.e. its trimethylphosphonium derivative (KPme3), by Nε-trimethyllysine histone binding proteins and Nε-trimethyllysine demethylases. Calorimetric and computational studies with histone binding proteins reveal that H3KP4me3 binds more tightly than the natural H3K4me3 substrate, though the relative differences in binding affinity vary. Studies with JmjC demethylases show that some, but not all, of them can accept the phosphonium analogue of their natural substrates and that the methylation state selectivity can be changed by substitution of nitrogen for phosphorus. The combined results reveal that very subtle changes, e.g. substitution of nitrogen for phosphorus, can substantially affect interactions between ligand and reader domains / demethylases, knowledge that we hope will inspire the development of highly selective small molecules modulating their activity.

The interplay between KMTs and KDMs regulates lysine methylation status, which in turn regulates binding of methylation state-specific chromatin binding modules. Four identified non-catalytic domains interact with N ε -trimethyllysines: plant homeodomains (PHD) (Fig. 1c), tandem tudor domains (TTD), chromodomains (CHD), and malignant brain tumour (MBT) proteins 3 , all of which bind N ε -trimethyllysine in a cage comprised of typically hydrophobic and aromatic residues 4 . Experimental and computational studies have shown that binding of N εtrimethyllysine by readers is driven by cation-π interactions between the positively charged quaternary ammonium group of N ε -trimethyllysine and electron-rich aromatic residues and by release of water molecules from the cage [5][6][7][8][9] .
Misregulation of histone modification is linked to human disease. For example, DNA encoding for the PHD3 finger of the KDM5A demethylase can fuse with that of nuclear pore protein 98 (NUP98) leading to the NUP98-KDM5A-PHD3 fusion protein, which is linked to acute myeloid leukaemia 10 . BPTF, a core subunit of the ATP-dependent nucleosome remodelling factor (NURF), can fuse with NUP98 to result in primary refractory acute megakaryoblastic leukaemia protein 11 . Other readers forming NUP98 fusion proteins include PHF23, NSD1, and NSD3 12,13 .
Research on KMTs and KDMs 4 has led to potent and partially selective inhibitors, of use in studying their roles and therapeutic potential. However, there remain challenges in achieving selectivity for particular KDM isoforms (there are >25 JmjC oxygenases and >60 2OG oxygenases in humans 14,15 ). Development of selective inhibitors for chromatin binding modules, which are often present in epigenetic writers and erasers, has been particularly challenging, with reported inhibitors of the >100 human PHD 16 and TTD 17 domains being relatively weak and nonselective binders [18][19][20] . Knowledge of the selectivity of ligand binding by chromatin binding proteins and modifying enzymes is thus of both fundamental and medicinal interest 21 . To investigate the extent to which histone N ε -methyllysine readers and erasers can manifest selectivity, we synthesised a peptide containing the simplest possible positively charged N ε -trimethyllysine analogue, i.e., ε -trimethylphosphonium lysine (K P me 3 ), and studied its interactions with histone N ε -methyllysine readers and erasers (Fig. 1). The results reveal that, at least some, readers and erasers can discriminate between K P me 3 and Kme 3 peptides, suggesting that identification of drug-like selective inhibitors should be possible.
Interestingly, for four of the readers, ITC experiments with H3K P 4me 3 indicated stronger complex formation than with H3K4me 3 ( Table 1). The largest increase was observed with KDM4A TTD , which manifested~12-fold stronger binding with H3K P 4me 3 compared to H3K4me 3 . SGF29 TTD exhibits comparable binding affinity for H3K4me 3 and H3K P 4me 3 ; note that it is the only reader tested not possessing a Trp residue in its hydrophobic cage 24 . This result correlates with the observed unusually strong binding of the neutral N ε -trimethyllysine carbon-analogue to SGF29 TTD compared with other reader proteins 5 . The increase in affinity for H3K P 4me 3 relative to H3K4me 3 is generally a result of a more favourable ΔH°, with the values for the ΔS°remaining largely unchanged for four of the five readers, the exception being KDM4A TTD , (Table 1).
Although the observed decreases in ΔH°are small (ΔΔH°: − 0.4 to −1.3 kcal mol −1 ) for KDM5A PHD3 , TAF3 PHD , BPTF PHD and SGF29 TTD , the decrease for H3K P 4me 3 relative to H3K4me 3 is relatively large (ΔΔH°: −4.7 kcal mol −1 ) for KDM4A TTD . The more favourable ΔH°for binding for H3K P 4me 3 over H3K4me 3 implies more favourable cation-π interactions between the trimethylphosphonium cation and the electron-rich aromatic cages, as found in related systems 5,7,8 . The longer C-P bond (1.87 Å) in H3K P 4me 3 compared to H3K4me 3 (C-N bond in H3K4me 3 is 1.47 Å) and increased volume ( + Pme 4 : 115 Å 3 , + Nme 4 : 105 Å 3 ) 28 may help position the methyl hydrogens of the quaternary phosphonium cation closer to the aromatic cage residues. Note that, the limited added volume of H3K P 4me 3 compared to H3K4me 3 means both are likely to release the same number of water molecules from the cages, suggesting equal contributions to affinity due to reader desolvation. Overall, the ITC results imply that the readers efficiently recognise the phosphonium analogue of N ε -trimethyllysine: importantly, despite the subtle nature of the difference between H3K P 4me 3 compared to H3K4me 3 , differences in the relative binding efficiencies of the readers for the peptides were observed.
Molecular dynamics simulations of histones with readers. We used molecular dynamics (MD) simulations to study how the readers bind to H3K4me 3 and H3K P 4me 3 . The N ε -trimethyllysine residue of H3K4me 3 in structures of reader protein complexes was replaced with K P me 3 with solvation in a 10 Å truncated octahedral box of TIP3P water 29 and neutralised explicitly with sodium or chloride ions. AMBER12 30 was used to simulate the Table 1 Thermodynamic parameters for association of H3K4me 3 and H3K P 4me 3 peptides (ART(Kme 3 /K P me 3 )QTARKS) with five human reader domains a .  3 and Kme 3 complexes, respectively ( Table 2). The Kme 3 and K P me 3 side chains in the modelled complexes have similar conformations, despite the larger size of P (Table 2 and Supplementary Fig. 8). The models imply that both the Kme 3 and K P me 3 side chains undergo only small deformations on TRP2 binding, as reflected in the strain energies: ΔE(aq) strain : 0.1 and 0.8 kcal mol −1 for the Kme 3 and K P me 3 complexes, respectively. The preference for K P me 3 over Kme 3 is also manifested in the absence of water, although to a lesser extent: ΔE int : −27.6 and −28.0 kcal mol −1 for the Kme 3 and K P me 3 complexes, respectively. This result supports the above proposal that, energetically, desolvation effects (ΔE(desolv) int ) are similar for Kme 3 and K P me 3. We investigated why the TRP2 aromatic cage interacts more favourably with K P me 3 than Kme 3 , using quantitative Kohn-Sham molecular orbital (KS-MO) theory and energy decomposition analysis (EDA) of ΔE int ( Table 2). The results imply that the more stabilizing interaction ΔE int for H3K P 4me 3 originates from more attractive electrostatic (by 0.8 kcal mol −1 ), orbital (by 0.6 kcal mol −1 ), and dispersion (by 1.6 kcal mol −1 ) interactions. The stronger electrostatic attraction of K P me 3 is due to the somewhat more positively charged methyl H atoms of the phosphonium group (Fig. 4a). The more attractive ΔE oi term in TRP2-K P me 3 results from stronger, more stabilizing donor-acceptor orbital interactions from π orbitals to the σ* C-H type orbitals on the K P me 3 side chain: the charge transfer is 0.09 electrons to K P me 3 and only 0.04 electrons to Kme 3 (Fig. 4b). The preference for K P me 3 is caused by the lower energy of the σ* C-H type orbitals of K P me 3 and their better overlap with π orbitals (Supplementary Table 3). Our bonding analyses show that these cation-π interactions can be viewed as cationic CH-π interactions. Note that the more favourable bonding terms in the TRP2-K P me 3 complex leads to a shorter d(H Me -C TRP-5MR ) distance between the phosphonium group and the cage, which slightly amplifies all interaction terms, including the steric (Pauli) repulsion (by 2.7 kcal mol −1 , Table 2).  We carried out EDA analyses for the homolytic formation of the C-H bonds in Kme 3 and K P me 3 at the BLYP-D3BJ/TZ2P level (Supplementary Table 4 and Supplementary Fig. 9). The larger proton affinity for Kme 3 compared to K P me 3 is maintained both in solution and in the gas phase. The EDA results imply that this derives substantially from the more favourable electrostatic interactions for Kme 3 compared to K P me 3 (by 5.4 kcal mol −1 ), even though the orbital interactions are more favourable for K P me 3 , though only by 1.2 kcal mol −1 . The VDD charge on CMe is −268 me and on N is +42 me, whereas for K P me 3 the VDD charge for CMe is −348 me and +302 me on P. The difference in homolytic formation of C-H bonds for Kme 3 or K P me 3 thus seems to be a subtle interplay of a more favourable (i.e. more negative) charge on the methyl carbon plus a less favourable (more positive) charge on the P atom in K P me 3 .
The H3K9me 3 results show little variations in product profiles indicating that the reagents are experimentally effective. With H3K P 9me 3 where H3K P 9me 2 and H3K P 9me 2 O are produced, on initial quenching with EDTA or 2,4-PDCA (which inhibits by chelating to Fe) we observed a slow increase in the peak corresponding to H3K P 9me 2 O, but not H3K P 9me 2 . The combined observations imply that slow production of H3K P 9me 2 O from H3K P 9me 2 can occur via non-enzymatic as well as enzymatic oxidation. To verify that products detected using MALDI-TOF MS are not instrumental artefacts, time-course measurements were performed on H3K9me 3 and H3K P 9me 3 with analysis by LC-MS. Similar demethylation and oxidation patterns, including production of H3K P 9me 2 and H3K P 9me 2 O from H3K P 9me 3 were detected as observed with MALDI-TOF MS ( Supplementary Fig. 15).
To directly compare the efficiency of demethylation of H3K9me 3 (5.0 µM) and H3K P 9me 3 (5.0 µM), they were incubated with KDM4E JmjC (0.5 µM) in the same vessel ( Supplementary  Fig. 16). H3K9me 3 was converted to H3K9me 2/1 with the same efficiency as the control, but there was no evidence that H3K P 9me 3 was converted to H3K P 9me 2 or H3K P 9me 2 O, although low levels of formation of H3K P 9me 2 cannot be ruled out as its peak overlaps with an isotope peak of H3K9me 3 . The combined results show H3K9me 3 is a substantially better substrate than H3K P 9me 3 .
KDM4E JmjC -catalysed demethylation of H3K P 9me 3 was analysed by 1 H and 31 P NMR; in both cases, the reaction proceeded to give signals corresponding to H3K P 9me 2 and H3K P 9me 2 O (Fig. 6). In the 31 P NMR, distinct resonances were observed for H3K P 9me 3 (25.9 ppm), H3K P 9me 2 H (28.4 ppm), and H3K P 9me 2 O (55.6 ppm). Notably, a different 31 P resonance (δ P : 28.4 ppm compared to 55.6 ppm) was observed when quenching the H3K P 9me 3 reaction with HCl (1 M); this was assigned as a protonated H3K P 9me 2 H species, as supported by 1 H-31 P HMBC NMR, and comparison of chemical shifts with those for similar species, i.e. PMe 3 , PMe 3 H + , and P(O)Me 3 ( Supplementary Fig. 17), under identical conditions. 31 P and 1 H NMR time-course studies confirmed demethylation and conversion of 2OG to succinate ( Supplementary Fig. 18).

Discussion
Methylation of carbon, nitrogen, oxygen and sulphur atoms in large and small biomolecules is of central biological importance; methyl groups linked via heteroatoms are common in drugs and agrochemicals. Alkylated phosphines are commonly used in organic synthesis, e.g., in Wittig reagents. It is thus perhaps surprising that methylphosphonium and related chemistry has, to our knowledge, not been more widely investigated in biochemistry 43,44 , in particular with respect to the possibility of demethylation.
Our studies on interactions between H3K4me 3 and H3K P 4me 3 and readers demonstrate that H3K P 4me 3 can substitute for H3K4me 3 45 , in most cases with increased affinity, due to stronger cation-π interactions (bonding analyses reveal true cationic CH-π interactions). Notably, there are differences in the relative binding efficiencies of the readers with H3K4me 3 compared to H3K P 4me 3 , implying selective inhibition of readers by small drug-like molecules should be feasible. Similar observations have been made in relation to cation-π interactions between tetramethylammonium compounds and their tetramethylphosphonium analogues with respect to binding to aromatic cavities. Related studies with γ-butyrobetaine 43 and the serine protease factor Xa 44 suggest our observations may be of a general nature.
The results with H3K P 4me 3 contrast those for other H3K4me 3 derivatives binding to readers, where typically comparable or lower affinities are observed ( Supplementary Fig. 24, Supplementary Table 5) compared to H3K4me 3 . For example, studies comparing binding of H3K4me 3 and H3K C 4me 3 to TAF3 PHD and KDM4A TTD reveal impaired binding of H3K C 4me 3 (ΔK d values of~2-fold) 46 . By contrast, an increase in, or comparable, stability is observed for the H3K P 4me 3 -reader complexes relative to the H3K4me 3 -reader complexes, with some showing much tighter binding (BPTF PHD , ΔK d :~7-fold and KDM4A TTD, ΔK d :~12-fold). For comparison, the difference in binding between H3K4me 3 and unmodified-lysine is protein and condition dependent, but typically the ΔK d is >20-fold in favour of H3K4me 3 10,23-25,47,48 . Even more pronounced decreases in binding affinities are observed with KDM5A PHD3 and TAF3 PHD 5 when the K4me 3 in H3K4me 3 is substituted for glycine, highlighting the importance of the lysine side chain in binding. Thus, the substitution of H3K4me 3 for H3K P 4me 3 can have a positive effect on binding, knowledge that might be exploited in inhibitor design.
Previous studies revealed that some JmjC KDMs can catalyse oxidation of substrates other than the established N ε -methylated lysine substrates, e.g., H3K9me 3/2 for KDM4E, as demonstrated with N ε -methyl-ethyl-lysine-9, a substrate that undergoes both demethylation and deethylation 49 . However, analysis with N εdiethyllysine showed no evidence of reaction, demonstrating limitation of the plasticity of the KDM4E active site towards alkylated lysine substrates. Some JmjC KDMs can also catalyse Nmethyl arginine demethylation and with appropriately sized N εsubstitutions some can catalyse formation of stable alcohol products 49,50 . We found that H3K P 9me 3 is a demethylation substrate for human KDM4A/D/E to give H3K P 9me 2 ; this observation is consistent with the relatively small increase in volume when H3K P 9me 3 is compared to H3K9me 3 (Δ[ + Pme 4 -+ Nme 4 ]: 10 Å 3 ) 28 , though the demethylation rate is significantly slower for H3K P 9me 3 than for H3K9me 3 . Strikingly, although KDM4 enzymes (KDM4A/D/E) catalysed formation of H3K P 9me 2 , they did not catalyse its further demethylation to give H3K P 9me 1 , despite efficient conversion of H3K9me 2 to H3K9me 1 . We propose that this, at least in part, is due to the decreased pK a of H3K P 9me 2 versus H3K9me 2it seems that, at least for the KMD4 JmjC KDMs, the positively charged form of N ε -dimethyllysine H3K9me 2 is the preferred substrate. Interestingly, we also observed conversion of H3K P 9me 2 to H3K P 9me 2 O, possibly in part by non-enzymatic oxidation; we saw no evidence for formation of the analogous H3K9me 2 O N-oxide.
We also investigated whether the JmjC-domain of KDMs, which notably accept H3K9me 2 can accept H3K P 4me 2 as a substrate. Since H3K P 9me 2 peptides are difficult to synthesise due to reactivity of the phosphine, we generated H3K P 9me 2 in situ from H3K P 9me 3 using KDM4E JmjC . The results with KDM3A and KDM7B, which naturally catalyse H3K9me 2 demethylation, provide clear evidence they do not catalyse demethylation of H3K P 9me 2 , revealing the ability of JmjC KDMs to accept P-analogues is subfamily dependent. As with the results for the readers, the results with JmjC KDMs show very small changes to the substrate, likely due to changes in size or charge, can make large differences in substrate selectivity. We hope that this knowledge will inspire medicinal chemistry efforts to identify JmjC KDM isoform specific inhibitors.
Phosphorous is essential for all life forms where it is principally found in its oxidised phosphate form, in nucleic acids, small molecules (e.g., ATP, NADPH), proteins and lipids, amongst other molecules. Alkylated phosphine compounds have, to our knowledge, not been identified in biology. In part this may be due to their tendency to be oxidized, as evident in our work where evidence for KDM4E-catalysed oxidation at H3K P 9me 2 to give H3K P 9me 2 O, rather than H3K9me 1 was accrued. However, phosphine (PH 3 ) is present in the Earth's atmosphere where it is proposed to be part of the phosphorus cycle 51 and, may be present in the atmosphere of Venus 52 . Our results show that at least several related enzymes can act on reduced phosphine derivatives, highlighting the possibility that reduced phosphine derivatives might, at least in some specialised contexts, have a biological role, and/or that they may have played a role in the evolution of biology.
Peptide synthesizer approach. Peptides were synthesized using a Liberty Blue microwave assisted solid phase peptide synthesizer (CEM corporation). The coupling steps were carried out using DIC and Oxyma in DMF in a microwave vessel at 90°C. Each coupling step was performed in DMF with an excess of Fmocprotected amino acids (5.0 equivalents). Note that the coupling step of Fmoc-K P me 3 -OH (2.0 equivalents) was performed manually, using HATU (2.5 equivalents) in DMF for 16 h at rt. Subsequently, any free N-terminal amine was capped using Ac 2 O (2.0 equivalents), and pyridine (2.4 equivalents) before treatment with piperidine.
Cleavage of the peptides was achieved using a mixture of TFA 92.5%, H 2 O 2.5%, triisopropylsilane 2.5%, and ethane-1,2-dithiol 2.5%; the product was precipitated from Et 2 O after 3-4 h. The crude product was suspended in Et 2 O, then centrifuged (3500 rpm, 4 minutes); the supernatant was then decanted (3 times). Purification of the peptides was performed by preparative HPLC. Analysis of the peptides was done by LC-MS and analytical HPLC. Conditions for a typical HPLC purification run were starting conditions: MeCN (3%) in H 2 O (both supplemented with 0.1% (v/v) TFA), a gradient to 100% MeCN over 30 minutes. Sample fractions were pooled based on the results of LC-MS analysis, then lyophilised to yield the desired product as a fluffy powder.
Isothermal titration calorimetry. ITC studies followed a reported procedure 26  . Experiments were conducted using ITC200 automated (GE Healthcare Life Sciences, USA) instrument at 25°C. Histone peptide titrations were performed with the same reader batches. Solutions of the reader in buffer (25-40 µM) and of the histone H3 peptide (350-600 µM) in buffer were prepared. The prepared solutions were plated into a 96-well plate and inserted into the instrument for analysis. Experiments were performed according to manufacturer's default settings: Plate pre-rinse syringe clean. A total of 19 injections were performed; each experiment was repeated 3-5 times. Heats of dilution for histone peptides determined in control experiments were subtracted from the titration binding data. Data were analysed with Origin 6.0 (Microcal Inc., Northampton, Massachusetts, USA) and curve fitting with one-site binding mode was applied.
Quantum chemical analysis. Quantum chemical calculations were performed with the Amsterdam Density Functional software (ADF) 53 using dispersioncorrected density functional theory at the BLYP-D3BJ/TZ2P level of theory 54 . Our BLYP-D3BJ/TZ2P approach provided results that are in excellent agreement with those of a recent high-level CCSD(T) benchmark study by Varma and coworkers (Supplementary Table 6) 55 . Solvation in water was simulated by means of the conductor like screening model (COSMO) of solvation implemented in ADF [56][57][58][59] . The cation-π interactions in TRP2-H3K4me 3 and TRP2-H3K P 4me 3 complexes were analysed through quantitative Kohn-Sham molecular orbital theory combined with energy decomposition analysis (EDA) 60,61 . In this method the bond energy in water ΔE(aq) is a combination of the strain energy (ΔE strain (aq)) associated with deforming the cation and the reader from their equilibrium structures to the geometry they adopt in the complex, combined with the interaction energy (ΔE int (aq)) between these deformed fragments in the complex: The role of desolvation in the complexation process can be analysed by splitting the solute-solute interaction (ΔE int (aq)) into the effect caused by the change in solvation (ΔE int (desolv)) and the remaining intrinsic interaction (ΔE int ) between the unsolvated fragments in vacuum: The interaction energy ΔE int can be further decomposed by: where, ΔV elstat corresponds to the classical electrostatic interaction between the unperturbed charge distributions of the deformed fragments, which is usually attractive. The Pauli repulsion (ΔE Pauli ) term comprises the destabilizing interactions between occupied orbitals and is responsible for steric repulsions. The orbital interaction (ΔE oi ) accounts for charge transfer (donor-acceptor interactions between occupied orbitals on one moiety with unoccupied orbitals of the other, including the HOMO-LUMO interactions) and polarization (empty/occupied orbital mixing on one fragment due to the presence of another fragment). Finally, the ΔE disp term accounts for the dispersion interactions based on Grimme′s DFT-D3BJ correction. The charge distribution was analysed using the Voronoi deformation density (VDD) method 62 .
Molecular dynamics simulations. MD simulations were carried out for 10 ns. Crystal structures for the models representing TAF3 PHD (PDB: 2K17), KDM4A TTD (PDB: 2GFA), KDM5A PHD3 (PDB: 2KGI), BPTF PHD (PDB: 2F6J), and SGF29 TTD (PDB: 3ME9) readers were used as starting structures for the protein-ligand modelling. Starting structures were built by manually replacing the Kme 3 residue of H3K4me 3 with K P 9me 3 residue in the reader protein crystal structures complexes. AMBER12 30 was used with the Amberff12SB force field to define protein partial charges. Hydrogen atom addition was performed with LEaP. Systems were solvated in a 10 Å truncated octahedral box of TIP3P 29 water and neutralised explicitly with either sodium or chloride counterions. Non-bonding parameters of Zn(II), previously established from studies of KDM4A 63 , were employed. Atomic partial charges for H3K p 9me3 correspond to the Restrained Electrostatic Potential (RESP) 64 charges, as shown in Supplementary Table 2. Parameters for Kme3 were taken from previous work 31 . The final systems were minimised for 1,000 cycles of steepest-descent minimization followed by 1,000 cycles of conjugate-gradient minimization to remove close van der Waals contacts using the sander program in AMBER12. Equilibration was achieved using PMEMD to heat the systems to 310 K followed by independent MD simulations performed with a periodic boundary condition at a constant pressure of 1 atm with isotropic molecule-based scaling at a time step of 2.0 fs. All simulations used a dielectric constant of 1.0, Particle Mesh Ewald summation 65 to calculate long-range electrostatic interactions and bondlength constraints applied to all bonds to H atoms. Trajectories were saved at 20 ps intervals and visualised using VMD 66 .
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.