Main

Small molecules can serve as versatile probes for perturbing the functions of proteins in biological systems and are a primary source of therapeutic agents to treat human disorders1. Nonetheless, most human proteins still lack selective chemical ligands and some classes of proteins are even considered undruggable2. Covalent ligands offer one strategy to expand the landscape of proteins amenable to targeting by small molecules. By combining features of recognition and reactivity, covalent ligands have the potential to target sites on proteins that are difficult to address by reversible binding interactions alone3. While original covalent probes often target essential catalytic residues within the active sites of enzymes, in particular, serine4 and cysteine5 residues of enhanced nucleophilicity, more recent successes in covalent ligand development include electrophilic small molecules that react with non-catalytic cysteines across diverse protein classes, including kinases6,7, GTPases8 and non-enzymatic proteins (for example, nuclear export factors9). These efforts have culminated in the approval of several covalent kinase inhibitors as drugs for treating diverse cancers6,7.

In attempts to understand the scope of proteins that may be targeted by covalent ligands, we recently evaluated the proteome-wide reactivity of a diverse set of cysteine-directed electrophilic fragments, which were found, as a collection, to engage cysteine residues on hundreds of proteins in human cell systems10. These proteins originated from diverse classes, including those deemed historically challenging to target with small molecules (for example, adaptor proteins and transcription factors). The total number of proteins harbouring liganded cysteines, however, still accounted for only ~20% of all proteins quantified in the study, suggesting that the realization of a more complete ligandability map of the human proteome may require extending beyond cysteine as a source for covalent probe development.

Among proteinaceous amino acids, lysine represents a potentially attractive candidate for covalent ligand development, as the lysine ε-amine is intrinsically nucleophilic and lysines are found at many functional sites, including enzyme active sites11,12 and at interfaces mediating protein–protein interactions13. Lysines also frequently serve as sites for post-translational regulation of protein structure and function through, for instance, acetylation14, methylation15,16 and ubiquitylation17. Individual lysine residues within functional protein pockets are susceptible to modification by electrophilic small molecules, including natural products such as wortmannin18, which targets a lysine in the active sites of PI3K kinases, activated esters that react with a lysine in transthyretin (TTR)19 and boronic acid carbonyl antagonists of the apoptosis regulatory protein MCL-1 (ref. 13). Additional electrophiles that have been shown to react with proteinaceous lysine residues include dichlorotriazines20,21, imidoesters22, 2-acetyl- or 2-formyl-benzeneboronic acids13,23, isothiocyanates24,25, pyrazolecarboxamidines26,27, sulfonyl fluorides28,29 and vinyl sulfonamides30.

Despite the aforementioned examples, the full spectrum of functional and ligandable lysines in the human proteome remains poorly understood. Building on previous work describing a chemical proteomic platform for assessing cysteine reactivity on a global scale31, initial attempts have been made to assess lysine reactivity in human proteomes, but these data sets, which were generated using aryl halide probes, were limited to quantifying a small number of lysines (<100) (ref. 21). Given the frequency of lysine residues in human proteins (~6% of all residues32), we hypothesized that the development of more advanced chemical proteomic methods capable of quantifying a much larger number of lysines in human proteomes would provide a deeper and more complete portrait of lysine reactivity and ligandability, as well as the potential relationship between these two parameters. Here, we show that an amine-reactive pentynoic acid sulfotetrafluorophenyl ester probe provides access to a very rich content of lysines (>9,000 residues in total) in the human proteome. We use this probe to quantify lysine reactivity and ligandability on a global scale, leading to the discovery of functional lysines that can be targeted by covalent ligands to perturb the activities of a diverse range of proteins.

Results

A chemical proteomic method for assessing lysine reactivity

We have previously described a quantitative and site-specific chemical proteomic method termed ‘isoTOP-ABPP’ (isotopic tandem orthogonal proteolysis-activity-based protein profiling) for measuring cysteine reactivity in native proteomes31. Here, we reasoned that exchanging the cysteine-directed iodoacetamide alkyne probe for a probe that shows preferential reactivity with amines would afford a platform for global lysine reactivity analysis (Fig. 1a). Among candidate amine-reactive groups, we considered activated esters as a good potential probe class, as they should show preferred reactivity with amines, display good solubility, and form stable, structurally simple adducts with proteinaceous lysines for characterization by MS methods. In an initial screen of alkyne-modified ester probes (115, Supplementary Fig. 1), we found that sulfotetrafluorophenyl (STP) and N-hydroxysuccinimide esters showed strong proteomic reactivity, as evaluated by copper-catalysed azide-alkyne cycloaddition (CuAAC, or click chemistry33) to a rhodamine-azide tag, SDS–PAGE and in-gel fluorescence scanning (Supplementary Fig. 1). Considering that tetrafluorophenyl esters are more stable in aqueous solution than NHS esters34, we selected STP-alkyne 1 as a probe for proteomic profiling of lysine reactivity.

Figure 1: Proteome-wide quantification of lysine reactivity.
figure 1

a, General protocol for lysine reactivity profiling by isoTOP-ABPP. Cellular lysates are labelled with an amine-reactive STP alkyne probe (1) at different concentrations. Labelled samples are conjugated to isotopically differentiated TEV protease-cleavable biotin tags (heavy (blue) and light (red) for 0.1 and 1.0 mM probe 1 treatment groups, respectively) by CuAAC33, mixed, and 1-labelled proteins enriched by streptavidin-conjugated beads and digested stepwise with trypsin and TEV to yield 1-labelled peptides for LC-MS analysis. Isotopic ratios (R values) reflect the relative MS1 chromatographic peak intensities for 1-labelled peptides in light versus heavy samples. b, Probe 1 preferentially labels lysine residues in human cell proteomes. Residues labelled by 1 were assigned by differential modification analysis of all quantified peptides identified in three replicate experiments comparing 0.1 versus 0.1 mM probe 1 treatments of MDA-MB-231 cell lysates. Peptides were required to feature no missed cleavage sites on unmodified lysine residues. Data represent means ± standard deviation for three experiments. c, R values for probe 1-labelled peptides from human cancer cell proteomes (MDA-MB-231, Ramos and Jurkat) treated with 0.1 versus 1.0 mM (black) or 0.1 versus 0.1 mM (grey) of probe 1. Representative MS1 chromatographic peaks for lysines of different reactivity categories are shown as insets (high, or hyper-reactive, R < 2.0 (K319 of CCT4); medium, 2.0 < R < 5.0 (K156 of XRCC5); low, >5.0 (K420 of ENO1)). d, Number of hyper-reactive and quantified lysines per protein shown for proteins found to contain at least one hyper-reactive lysine. e, Hyper-reactive lysines are site-selectively labelled by activated ester probes. HEK293T cells expressing representative proteins with hyper-reactive lysines (or the corresponding lysine-to-arginine mutant) as Flag epitope-tagged proteins were treated with the indicated lysine-reactive probe and analysed by gel-based ABPP.

We initially assessed the scope and selectivity with which 1 reacted with lysine residues in human cell proteomes. Two equal amounts of proteomic lysate from the human breast cancer cell line MDA-MB-231 were treated with 1 (100 µM, 1 h), CuAAC-conjugated to isotopically differentiated tobacco etch virus protease (TEV)-cleavable, azide–biotin tags (heavy and light, respectively), combined and analysed by isoTOP-ABPP. Measurement of the MS1 chromatographic peak ratios for isotopically differentiated light/heavy peptide pairs provided an isoTOP-ABPP ratio, or R value, which centred on ~1.0 for the more than 5,000 quantified, probe 1-labelled peptides. As determined by tandem MS and differential modification analysis, >52% of 1-labelled peptides were assigned as being uniquely modified on lysine residues, with 54% of the remaining 1-labelled peptides being assigned with lysine modifications as well as alternative residue modifications. Because lysine modification creates a missed trypsin cleavage site, we further assessed the fraction of alternative amino-acid modification assignments for their occurrence on peptides harbouring a missed lysine cleavage site. We found that most of the predicted non-lysine modifications for 1 occurred on peptides with missed lysine cleavage sites (Supplementary Fig. 1), indicating that they probably represent mis-assignments of reactivity events that actually occurred on lysine. Once the isoTOP-ABPP data were filtered to remove peptide assignments with unmodified, missed lysine cleavage events, lysine accounted for the vast majority of all assignments for probe 1 modification (Fig. 1b). The remaining alternative probe 1 modifications were mostly assigned to serine (~8% of the total 1-labelled peptides) and these occurred on fully digested tryptic peptides (Fig. 1b), probably designating them as authentic modifications. These results, taken together, indicate that 1 shows broad reactivity and good selectivity for lysine residues in the human proteome.

Quantitative profiling of lysine reactivity in human cell proteomes

Previous isoTOP-ABPP studies have shown that the human proteome possesses a specialized set of hyper-reactive cysteine residues that are enriched in functional residues (for example, catalytic residues, redox-active residues) compared to bulk cysteine content31. Here, we assessed the intrinsic reactivity of lysine residues in human cell proteomes by comparing their concentration-dependent labelling with probe 1, where highly reactive lysines would be expected to show nearly equivalent labelling intensities at low versus high concentrations of probe 1, with less reactive lysines displaying clear concentration-dependent increases in labelling intensity. In brief, we treated proteomes from three human cancer cell lines (MDA-MB-231, Ramos and Jurkat cells) with low versus high concentrations of probe 1 (0.1 versus 1 mM, n = 4 per group) for 1 h and then analysed the samples by isoTOP-ABPP, wherein high, medium and low reactivity lysines were distinguished by their respective isotopic ratio values (R10:1 < 2, 2 < R10:1 < 5, R10:1 > 5, respectively). To minimize false quantification events, we also required that lysines were detected in control (0.1 versus 0.1 mM) experiments with R1:1 values of ~1.0 (see Supplementary Methods for additional details).

In total, ~4,000 lysine residues were assessed for intrinsic reactivity across the three tested cell lines (Supplementary Fig. 2), and individual lysines showed consistent reactivity values for replicate experiments performed within (Supplementary Fig. 2) and across these cell lines (Supplementary Fig. 2). The majority of quantified lysines showed strong, concentration-dependent increases in reactivity with probe 1, indicative of residues with low intrinsic reactivity (Fig. 1c). In contrast, a rare subset of the quantified lysines (<10%, or 310 total residues) exhibited heightened (hyper-) reactivity with probe 1 (R10:1 < 2) (Fig. 1c). Most proteins contained only one hyper-reactive lysine among several quantified lysines (Fig. 1d), and the atypical hyper-reactivity of these lysines was further supported by comparing their R10:1 values to those of other lysines quantified on the same protein (Supplementary Fig. 2). We confirmed the lysine hyper-reactivity determinations made by isoTOP-ABPP by recombinantly expressing wild-type (WT) and lysine-to-arginine mutant proteins and comparing their reactivity by gel-based ABPP using fluorescent or alkyne-tagged activated ester probes (Supplementary Fig 1). Each protein examined showed strong labelling with activated ester probes and the labelling of one or more of these probes was generally blocked, in many cases completely, by mutation of the hyper-reactive lysine to arginine (Fig. 1e, Supplementary Fig. 2 and Supplementary Table 1). Considering that there were, on average, 30 lysine residues per examined protein, the blockade of activated ester probe labelling by mutation of a single lysine in each protein underscores the unusual hyper-reactivity of these residues.

Features of hyper-reactive lysines

Hyper-reactive lysines were found on proteins from all major classes and showed a distribution similar to those of less reactive lysines (Fig. 2a). Hyper-reactive lysines were not, as a group, more conserved across organisms than lysines of lower reactivity, although this analysis was complicated by the high median conservation (~80%) of all 1-labelled lysines across the species examined (Supplementary Fig. 3). The primary sequence surrounding hyper-reactive lysines also did not show evidence of any obvious conserved motifs (Supplementary Fig. 3), indicating that higher-order structural features in proteins are probably imparting enhanced reactivity on these lysines. Consistent with this hypothesis, the frequency of lysines found in functional sites on proteins (for example, enzyme active sites, ligand-binding sites), as assessed by analysis of three-dimensional protein structures, was positively correlated with reactivity (Fig. 2b). Protein pockets of uncharacterized function (as defined by AutoSite35 analysis of protein structures) also contained a greater percentage of hyper-reactive lysines compared to less reactive lysines (Supplementary Fig. 3). Interestingly, we observed a striking inverse correlation between lysine reactivity and evidence of ubiquitylation as reported in the PhosphoSitePlus database36 (Fig. 2c), and a similar, albeit more tempered trend was found for lysine acetylation (Supplementary Fig. 3). These data, taken together, indicate that the localization of lysines to pockets on proteins may represent a prevalent mechanism for conferring heightened reactivity and such distributions may further hinder post-translational modification of the lysines, possibly due to limited surface exposure.

Figure 2: Global and specific assessments of the functionality of lysine reactivity.
figure 2

a, Distribution of functional classes of proteins that contain hyper-reactive lysines compared to other quantified proteins lacking hyper-reactive lysines. b, Hyper-reactive lysines are enriched proximal to (within 10 Å) annotated functional sites for proteins that have X-ray or NMR structures in the Protein Data Bank (see Supplementary Methods for further details). c, Hyper-reactive lysines are less likely to be ubiquitylated than lysines of lower reactivity (ubiquitylated lysines were defined as those with ≥10 reported ubiquitylation events in public databases36). d, Mutation of hyper-reactive lysines blocks the catalytic activity of NUDT2 and G6PD and reduces the activity of PFKP. Data represent means ± standard deviation for three experiments. Statistical significance was calculated with unpaired Students t-tests in comparison to WT activity: **P < 0.01, ***P < 0.001, ****P < 0.0001.

We examined whether some of the hyper-reactive lysines located in functional pockets contributed to protein activity. NUDT2, which is a diadenosine tetraphosphate hydrolase implicated in cancer and immune cell metabolism37, possesses a hyper-reactive lysine (K89) that is highly conserved and predicted, based on an NMR structure of NUDT2, to coordinate alpha-phosphate substrate binding38. However, to our knowledge, the contributions of K89 to NUDT2 catalysis have not been investigated. We found that mutation of K89 to arginine dramatically reduced the hydrolytic activity of NUDT2 (Fig. 2d). A similar disruption of catalysis was observed by mutation of the conserved, hyper-reactive lysine (K171) in the pentose phosphate pathway enzyme glucose 6-phosphate 1-dehydrogenase (G6PD) (Fig. 2d), which is consistent with previous findings39. Both K89 of NUDT2 and K171 of G6PD are active-site residues (Supplementary Fig. 3) and we therefore wondered whether hyper-reactive lysines located in potential allosteric pockets might also affect enzyme function. As a case study, we examined the hyper-reactive lysine (K688) in platelet-type phosphofructokinase (PFKP), which is located in an allosteric pocket >22 Å away from the active site (Supplementary Fig. 3). Mutation of K688 to arginine in PFKP produced a partial, but significant reduction in PFKP activity (Fig. 2d), pointing to a role for this lysine in allosteric regulation of PFKP function.

Quantitative profiling of lysine ligandability in human cell proteomes

We next applied isoTOP-ABPP in a ‘competitive’ format to assess the ligandability of lysines (Fig. 3a), where human cell proteomes were pre-treated with a small library (~30 member, 50–100 µM) of amine-reactive electrophilic fragments (activated esters, such as pentafluorophenyl- (1928), dinitrophenyl- (2945) and NHS esters (46) and N,N′-diacyl-pyrazolecarboxamidines (49,50)26,27) as well as one non-electrophilic control compound 51 (Fig. 3b and Supplementary Fig. 4) or DMSO control, followed by exposure to probe 1 (100 µM). Fragment-sensitive lysines were identified as those showing substantial reductions (≥75%) in enrichment by 1 in the presence of one or more fragments compared to the DMSO control (R ≥ 4 for DMSO/fragment).

Figure 3: Proteome-wide screening of lysine-reactive fragment electrophiles.
figure 3

a, General protocol for competitive isoTOP-ABPP. Ligandability ratios, or R values, are measured by quantifying the relative MS1 chromatographic peak intensities for 1-labelled peptides in DMSO- (heavy, or blue) versus fragment-treated (light, or red) samples. An R value of ≥4 was used to define a fragment liganding event for a quantified lysine. b, General structures of a lysine-reactive, electrophilic fragment library. Supplementary Fig. 4 shows chemical structures of library members. c, Left: Fraction of total quantified lysines and proteins that were liganded by fragment electrophiles. Middle: of the liganded proteins, the fraction found in DrugBank. Right: functional classes of liganded DrugBank and non-DrugBank proteins. d, Number of liganded and quantified lysines per protein. Analysis was applied to proteins containing at least one liganded lysine. e, R values for ten lysines in PFKP, identifying K688 as the only liganded lysine in this protein. Each point represents a distinct fragment–lysine interaction quantified by isoTOP-ABPP. The red dashed line marks the R value of 4 used to define a fragment liganding event. f, Comparison of the ligandability of lysine residues as a function of reactivity with probe 1 (as measured in Fig. 1). Individual lysines are plotted on the x axis, sorted by reactivity, which is shown on the left y axis, with lower R values correlating with elevated reactivity. A histogram with a bin size of 200 is shown in blue for the percentage of liganded lysines within each reactivity bin (percent values shown on the right y axis). g, Lysine reactivity distribution for both liganded and unliganded lysine residues labelled by probe 1. h, Overlap of proteins harbouring liganded lysines and liganded cysteines. Cysteine ligandability was taken from ref. 10.

We quantified, on average, >2,700 lysines per data set and, in aggregate, >8,000 lysines from 2,430 proteins across all data sets (Fig. 3c and Supplementary Table 2). Each lysine was quantified, on average, in 24 individual experiments (Supplementary Fig. 4 and Supplementary Table 2), providing a good initial assessment of ligandability potential. We identified, in total, 121 liganded lysines in 113 proteins (Fig. 3c). We quantified, on average, approximately four lysines per protein that reacted with probe 1 (Fig. 3d), indicating that ligandability was a rare feature. A striking example is PFKP, where a single liganded lysine was identified—the aforementioned K688—along with nine additional quantified lysines that showed no evidence of ligandability (Fig. 3e). Similarly, hexokinase-1 (HK1) possessed a single liganded lysine K510 among six quantified lysines (Supplementary Fig. 4). The majority of proteins harbouring liganded lysines were not found in DrugBank (73%, Fig. 3c) and these proteins showed a much broader class distribution than the smaller fraction of DrugBank proteins containing liganded lysines (27%), which were mostly enzymes (Fig. 3c). Prominent subgroups of non-DrugBank proteins with liganded lysines included transcription factors and scaffolding proteins (Fig. 3c), which are considered challenging to target with small molecules.

Hyper-reactive lysines showed greater ligandability than less reactive lysines, although many liganded lysines were also found in the latter group (R10:1 > 2.0, Fig. 3f,g). Of note, only a small fraction (~20%) of proteins with liganded lysines were found to contain liganded cysteines in a previous study10 (Fig. 3h). These results, taken together, indicate that fragment electrophile interactions with lysines depend on both reactivity and recognition and canvas a distinct and complementary portion of the human proteome compared to covalent chemistries targeting other nucleophilic amino acids.

Structure–activity relationship analysis of lysine-fragment electrophile interactions

Most of the liganded lysines (69%) interacted with a limited fraction (<10%) of the tested fragment electrophiles, although a small subset of lysines (8%) were targeted by a substantial portion of the compounds (≥25%) (Supplementary Fig. 5). Conversely, the fragment electrophiles showed large differences in proteomic reactivity towards lysines (Supplementary Fig. 5), ranging from 1 to 35% of the liganded residues (Supplementary Fig. 5). No lysine reactivity was observed for the non-electrophilic control fragment 51 (Supplementary Figs 4 and 5). The dinitrophenyl esters showed somewhat greater overall reactivity compared to the corresponding pentafluorophenyl esters (Supplementary Fig. 5), which correlated with the faster solvolysis rates observed for the former class of compounds (Supplementary Fig. 5). Despite these general trends, individual lysines displayed markedly distinct structure–activity relationships (SARs) that, in some cases, directly opposed the overall reactivity profiles of the fragment electrophile library (Fig. 4a and Supplementary Table 2). The hyper-reactive lysine K35 in the hormone-binding protein transthyretin TTR, for instance, which has previously been shown to be modified selectively in human plasma by activated (thio)ester and sulfonyl fluoride ligands19,28, was preferentially targeted by the dinitrophenyl ester fragment 31 over fragments that showed much greater proteome-wide reactivity (for example, 29 and 30) (Fig. 4a and Supplementary Fig. 5). Further evidence that recognition events make substantive contributions to fragment–lysine interactions was found in the distinct lysine reactivity profiles displayed by fragment electrophiles bearing a common leaving group (Fig. 4b, left). We confirmed these SAR assignments by gel-based ABPP with recombinantly expressed proteins (Fig. 4b, right, and Supplementary Fig. 5). The identity of the leaving group of activated ester fragments also influenced reactivity, as reflected by a subset of lysines that were preferentially liganded by pentafluorophenyl or dinitrophenyl esters bearing the same recognition group (Supplementary Fig. 5). The most distinctive lysine reactivity profiles were observed for N,N′-diacyl-pyrazolecarboxamidine fragments 49 and 50, which, despite sharing several targets with activated esters, also reacted with 15 lysines in human cell proteomes that showed negligible cross-reactivity with activated esters (see representative proteins at the bottom of Fig. 4a and Supplementary Table 2). We confirmed the reactivity of one of these lysines (K89 of NUDT2) with N,N′-diacyl-pyrazolecarboxamidine fragments by recombinant expression of the parent protein and competitive gel-based ABPP (Supplementary Fig. 5).

Figure 4: Analysis of fragment–lysine interactions.
figure 4

a, Heatmap showing R values for representative lysines and fragments organized by relative proteomic reactivity of the fragments (high to low, left to right) and number of fragment hits for individual lysines (high to low, top to bottom). ND, not detected. b, Fragment SAR determined by competitive isoTOP-ABPP is recapitulated by gel-based ABPP of recombinant proteins. Left: heatmap depicting R values for the indicated fragment–lysine interactions determined by competitive isoTOP-ABPP. Right: HEK293T cells recombinantly expressing representative liganded proteins (or the corresponding lysine-to-arginine (KR) mutants) as Flag epitope-tagged proteins were treated with fragment electrophiles (50 µM, 1 h) followed by the indicated lysine-reactive probes and analysed by gel-based ABPP. SIN3A corresponds to amino acids 1–400 of SIN3A.

We next set out to confirm fragment–lysine adducts by developing a quantitative, MS-based platform that simultaneously measured both fragment electrophile modification of lysines in individual proteins and the fractional occupancy and specificity of these reactions (Fig. 5a). Proteins containing liganded lysines discovered by isoTOP-ABPP were expressed with a Flag epitope tag in HEK 293T cells, treated with fragment electrophiles or DMSO, enriched by anti-Flag immunoprecipitation, proteolytically digested and the tryptic peptides from fragment- and DMSO-treated samples then isotopically differentiated by reductive dimethylation (ReDiMe)40,41, combined pairwise and analysed by LC-MS/MS. This protocol yielded high average sequence coverage (>40%) for the six tested proteins (PFKP, PNPO, HK1, HDHD3, XRCC6 and SIN3A) and, in each case, we obtained definitive evidence that the liganded lysine assigned by isoTOP-ABPP was directly adducted by the corresponding electrophilic fragment (Fig. 5b and Supplementary Table 2). We also observed depletion of the unmodified tryptic peptides containing the liganded lysines and/or adjacent peptides requiring the liganded lysine as a cleavage site (Fig. 5b, blue dots). Other tryptic peptides generated by a lysine cleavage event were unaffected by fragment electrophile treatment (Fig. 5b, black dots), indicating the specificity of fragment reactions with individual lysines on the tested proteins (as also predicted by isoTOP-ABPP, Fig. 3d).

Figure 5: Confirmation of site-specific fragment–lysine reactions by MS-based proteomics.
figure 5

a, Schematic workflow for direct measurement of lysine–fragment reactions on proteins by quantitative proteomics. Flag epitope-tagged proteins are recombinantly expressed in HEK293T cells and the cellular lysates treated with DMSO or a fragment ligand and immunoprecipitated with anti-Flag agarose resin. The enriched proteins are eluted from the resin, digested with trypsin and tryptic peptides from the DMSO and fragment-treated samples isotopically labelled by reductive dimethylation (ReDiMe) with heavy (blue) and light (red) formaldehyde40,41, respectively. The DMSO and fragment-treated samples are then combined and analysed by LC-MS. Lysine–fragment reactions were confirmed by both (1) detection of the peptide–fragment adduct exclusively in the fragment-treated sample (top trace) and (2) depletion of the parent unlabelled tryptic peptides containing the indicated lysine or having the lysine at a proteolytic cleavage site (bottom trace). b, R values for all detected, unmodified lysine-containing tryptic peptides for representative liganded proteins after treatment with the indicated compounds at 50 µM for 1 h. Unmodified peptides that contain the liganded lysine or have it at a proteolytic cleavage site are shown as blue dots. MS1 chromatographic peaks for fragment–peptide adducts are shown in the inset traces.

Functional analysis of fragment–lysine interactions

We next aimed to determine the functional impact of fragment–lysine interactions mapped by isoTOP-ABPP. As initial case studies, we selected two enzymes with liganded active-site lysines—pyridoxamine-5′-phosphate oxidase (PNPO) and NUDT2. PNPO catalyses the FMN-dependent oxidation of pyridoxamine-5′-phosphate and pyridoxine-5′-phosphate to pyridoxal-5′-phosphate in vitamin B6 synthesis42. PNPO possesses a hyper-reactive lysine K100 (R10:1= 0.7, Supplementary Table 1) located in the enzyme's active site and shown in previous structural studies to interact with substrate42 (Supplementary Fig. 6). Competitive isoTOP-ABPP uncovered a highly restricted SAR for ligand engagement of K100, with only two fragments (19 and 22) fully blocking probe 1 labelling of this residue (Supplementary Fig. 6 and Supplementary Table 2). We confirmed, by gel-based ABPP, that fragment 19 blocked probe labelling of K100 in PNPO with an apparent half maximal inhibitory concentration (IC50) value of 3 µM (Fig. 6a and Supplementary Fig. 6). A similar IC50 value (~5 µM) was measured for blockade of PNPO catalytic activity by 19 using a substrate assay43 (Fig. 6a). The inhibitory effect of 19 was not observed with a K100R mutant of PNPO (Fig. 6a), which also did not label with amine-reactive probes (Supplementary Fig. 6).

Figure 6: Fragment–lysine reactions inhibit the function of diverse proteins.
figure 6

ac, Fragments targeting active site (PNPO (a) and NUDT2 (b)) and allosteric (PFKP) (c) lysines in metabolic enzymes block enzymatic activity in a concentration-dependent manner, with apparent IC50 values comparable to those measured by gel-based ABPP with lysine-reactive probes (probe labelling). Data represent means ± standard deviation for at least three experiments. CI, confidence interval. d, Liganded lysine K155 in SIN3A (red) is located at the protein–protein interaction site of the PAH1 domain (green) (shown in the NMR structure (PDB ID: 2RMS) to interact with the SID domain of SAP25 (ref. 64) (blue)). e, Fragment 21 (50 µM) fully competes probe 1 labelling of K155 of SIN3A, as determined by isoTOP-ABPP of human cancer cell proteomes. See inset in d for a representative MS1 chromatographic peak of the tryptic peptide containing K155 (R = 20, >95% blockade of probe 1 labelling by 21). f, Gel-based ABPP confirms that 21 blocks probe 17 labelling of SIN3A at K155 in a concentration-dependent manner. g, Heatmap showing the enrichment of SIN3A-interacting proteins in co-immunoprecipitation-MS-based proteomic experiments and blockade of SIN3A–TGIF1 interaction by 21 (50 µM, 1 h). h,i, Western blot analysis of Flag-SIN3A or the indicated Flag-SIN3A mutants, or Flag-GFP, co-expressed in HEK293T cells with Myc-TGIF1 or Myc-TGIF2. Cellular lysates were treated with DMSO or 21 (50 µM, 1 h), and proteins immunoprecipitated before western blot analysis (h). Quantification of western blotting data for four biological replicates (i). Data represent means ± standard deviation for four experiments. Statistical significance was calculated with unpaired Students t-tests comparing SIN3A WT + 21 to K155R + 21 groups: *P < 0.05, **P < 0.01. In fi, Flag-SIN3A or the indicated Flag-SIN3A mutants correspond to amino acids 1–400 of SIN3A.

NUDT2 is responsible for the catabolism of nucleotide cellular stress signals in human cells37 and was found to contain a hyper-reactive and liganded lysine K89 that is located proximal to the enzyme's nucleotide-binding site (Supplementary Fig. 3). K89 also exhibited a restricted SAR by isoTOP-ABPP, preferentially reacting with the two N,N′-diacyl-pyrazolecarboxamidine fragments 49 and 50 (Supplementary Fig. 6 and Supplementary Table 2). We confirmed by gel-based ABPP that fragment 49 blocked probe labelling of NUDT2 with an apparent IC50 of 2 µM (Fig. 6b and Supplementary Fig. 6), and an equivalent IC50 value was measured for inhibition of NUDT2 activity using a substrate assay44 (Fig. 6b), which was also used to determine a kobs/[I] (a kinetic parameter that measures covalent binding interactions) value for 49 of 46.3 ± 1.3 M−1 s−1 (Supplementary Fig. 6). Because mutation of K89 to arginine (K89R) inactivated NUDT2 in the substrate assay (Fig. 2d), we could not test the inhibitory effect of 49 on the K89R mutant, but we did confirm by gel-based ABPP that the K89R mutant showed a substantial reduction in amine-reactive probe labelling equivalent to that observed following treatment of NUDT2 with 49 (Supplementary Fig. 6).

We next turned our attention to liganded lysines residing in more poorly characterized sites on proteins, specifically, a putative allosteric pocket in PFKP and a protein–protein interaction site in SIN3A. PFKP is responsible for the phosphorylation of fructose-6-phosphate to fructose-1,6-bisphosphate, the committed step of glycolysis45. Probe 1 labelling of the hyper-reactive lysine K688 in PFKP was completely blocked by fragment 20, which otherwise exhibited limited reactivity across the proteome (Fig. 4a and Supplementary Figs 5 and 6). Gel-based ABPP confirmed that 20 blocked probe labelling of recombinant PFKP with an apparent IC50 of 2 µM (Fig. 6c and Supplementary Fig. 6), and a loss in probe reactivity was observed for the K688R mutant of PFKP (Fig. 1e and Supplementary Fig. 6). Using an enzyme-coupled assay monitoring the conversion of NAD+ to NADH by ultraviolet absorbance46, we found that the activity of WT-PFKP, but not the K688R-PFKP mutant was inhibited by 20 with an apparent IC50 of 2.9 µM (Fig. 6c and Supplementary Fig. 6). Fragment 20 inhibition of the catalytic activity of WT-PFKP plateaued at ~80% reduction in substrate turnover (Fig. 6c and Supplementary Fig. 6), indicating that ligand reactivity at the K688 allosteric site substantially, but incompletely, blocks enzyme function.

SIN3A is a multidomain 145 kDa transcriptional repressor involved in histone deacetylase regulation47 and suppression of MYC-responsive genes48. We found that SIN3A contains a hyper-reactive lysine K155 (R10:1= 1.2, Supplementary Table 1) located in the first paired amphipathic helix (PAH1) domain of the protein (Fig. 6d). Our isoTOP-ABPP experiments revealed that fragment 21 engages K155 in SIN3A (Fig. 6d, inset, and Fig. 6e), but otherwise shows low proteome-wide reactivity (Fig. 6e and Supplementary Fig. 5). We recombinantly expressed a Flag-tagged SIN3A variant containing the N-terminal PAH1 and PAH2 protein–protein interaction domains (amino acids 1–400) in HEK293T cells and found that treatment of cell lysates with 21 produced a site-specific and complete blockade of probe labelling of K155 with an apparent IC50 of 5 µM (Fig. 6f and Supplementary Fig. 7). We then used quantitative SILAC (stable isotopic labelling with amino acids in cell culture49) proteomics to identify SIN3A-interacting proteins that were sensitive to mutation of K155 and/or treatment with 21. HEK293T cells metabolically labelled with isotopically differentiated amino acids were transfected with cDNA constructs for Flag-SIN3A (heavy-labelled cells) or Flag-GFP (light-labelled cells), collected, lysed and immunoprecipitated with anti-Flag antibodies. Heavy- and light-labelled immunoprecipitates were combined and subjected to tryptic digestion followed by LC-MS/MS analysis, which furnished a set of SIN3A-interacting proteins, defined as proteins that were substantially (more than fivefold) enriched in the SIN3A-transfected compared to GFP-transfected samples (Fig. 6g and Supplementary Table 2). Similar quantitative proteomic experiments compared WT-SIN3A to a K155W-SIN3A mutant and DMSO-treated WT-SIN3A to 21-treated WT-SIN3A. The K155W mutant, which was generated to mimic incorporation of a bulky hydrophobic group into the 21-sensitive pocket of SIN3A, failed to substantially enrich two established SIN3-interacting proteins—TGIF1 and TGIF2 (refs 50, 51)—that co-immunoprecipitated with WT-SIN3A (Fig. 6g and Supplementary Table 2). Treatment with 21 also strongly blocked the TGIF1–SIN3A interaction, but only produced a marginal effect on TGIF2–SIN3A interaction (Fig. 6g and Supplementary Table 2). Other known SIN3A-interacting proteins that co-immunoprecipitated with WT-SIN3A, such as MAX (ref. 52), MNT (ref. 52) and MXI1 (ref. 53), were not affected by K155W mutation or 21 treatment (Fig. 6g).

We further evaluated the effect of 21 on SIN3A interactions with TGIF1/TGIF2 by co-expressing these proteins with complementary epitope tags (Flag and Myc, respectively). In this system, fragment 21 treatment, as well as K155W mutation, blocked the co-immunoprecipitation of TGIF1 as measured by anti-Myc blotting (Fig. 6h,i). The K155W mutant also strongly inhibited co-immunoprecipitation of TGIF2 with SIN3A, while 21 exerted a partial blockade of this association (Fig. 6i and Supplementary Fig. 7). Importantly, mutation of K155 to arginine (K155R) conferred resistance to the effects of 21 on the SIN3A–TGIF1 interaction (Fig. 6h,i and Supplementary Fig. 7). Taken together, these data demonstrate that covalent ligands targeting K155 in SIN3A can pharmacologically disrupt a select subset of protein–protein interactions implicated in gene regulation.

Discussion

Chemical proteomic technologies, such as ABPP, have proven valuable for ligand/drug development by providing quantitative readouts of target engagement and selectivity in native biological systems10. Considering its nucleophilic side chain and prevalence in proteins, lysine is an attractive candidate amino acid for covalent ligand development. pKa-perturbed lysine residues play important functional roles in proteins54,55 and electrophilic compounds have been found to target lysines in diverse types of proteins (for example, metabolic enzymes, such as PGAM1 (ref. 56), hormone-binding proteins, such as TTR (ref. 57), lipid kinases, such as PI3Ks (ref. 18) and adaptor proteins, such as MCL-1 (ref. 13)). Nonetheless, our understanding of lysine reactivity and ligandability across the human proteome remains limited. We and others have used the chemical proteomic method isoTOP-ABPP to measure the reactivity31 and covalent ligand interactions of cysteine residues in native biological systems10,58,59,60. Here, we have extended this platform to globally profile the reactivity and ligandability of thousands of lysine residues in human cell proteomes. Key to success was selection of an electrophilic group—the STP ester—that displayed broad and selective reactivity with lysines over other proteinaceous amino acids, which probably accounted for the much deeper coverage of lysines compared to first-generation probes based on aryl halide reactive groups21.

When combined with previous chemical proteomic studies of cysteine reactivity31, our results provide further evidence that heightened reactivity of nucleophilic amino acids is a hallmark of functionality and ligandability. Cysteine, however, is a much less frequent amino acid in proteins compared to lysine and, in this context, we find it noteworthy that hyper-reactive lysines could be site-selectively modified by activated ester probes in proteins that harbour 50+ other lysines (for example, Fig. 1e and Supplementary Fig. 2). This feature enabled screening of these hyper-reactive lysines for ligandability using convenient gel-based assays (for example, Supplementary Fig. 5). On the other hand, the greater frequency of lysine compared to cysteine in proteins presents a technical challenge for achieving a complete inventory of lysines in the proteome. This problem may not simply be overcome by raising the concentration of activated ester probe in chemical proteomic experiments, which we have found instead tends to increase the signals and coverage of lower-reactivity lysines in abundant proteins (possibly at the expense of detecting high-reactivity lysines on lower-abundance proteins). More promising might be to perform additional upfront chromatography steps to better fractionate the proteome before enrichment and MS analysis of peptides containing probe-reactive lysines. Additionally, because probe reactivity with lysines blocks tryptic cleavage sites, the use of alternative proteases may uncover additional probe–lysine reactivity events that evade detection in conventional tryptic digest protocols. Finally, subsets of lysines can be selectively targeted with greater sensitivity using tailored electrophilic probes that leverage recognition elements to favour binding to functional protein pockets, such as the ATP-binding sites of kinases11,29.

Our chemical proteomic experiments have also provided valuable initial insights into the global ligandability potential of lysines in the human proteome. Most of the liganded lysines discovered herein were found in proteins lacking small-molecule probes, including proteins not present in DrugBank or targeted by cysteine-reactive fragments in a previous study10. We also demonstrated that lysine-reactive fragments can block the function of proteins, including inhibition of enzyme activity by both active site (PNPO, NUDT2) and allosteric (PFKP) mechanisms, as well as disruption of specific protein–protein interactions in transcriptional regulatory complexes (SIN3A–TGIFs). The SIN3A–TGIF1 interaction has been found to contribute to invasiveness of triple negative breast cancer50, suggesting that more optimized chemical probes targeting K155 in SIN3A may exert anti-tumorigenic effects.

Based on our competitive isoTOP-ABPP results, we believe that a broader effort to discover covalent ligands for lysines has the potential to substantially expand the druggable content of the human proteome. The success of such a programme, however, may depend on identifying alternative amine-reactive chemotypes, as the activated esters tested herein are probably too prone to enzymatic and non-enzymatic hydrolysis for development into cellular or in vivo probes. Alternative amine-reactive electrophiles, such as sulfonyl fluorides28,29 or the N,N′-diacyl-pyrazolecarboxamidines explored herein, may offer more suitable starting points for optimization of lysine-targeting covalent ligands for cell biological studies. Alternative electrophiles, when used as broad profiling probes, may also provide access to additional lysine residues in the proteome, although the chemoselectivity of such probes could present a challenge. While our manuscript was under review, for instance, Ward and colleagues characterized the proteomic reactivity of an NHS-ester probe and found that, while this activated ester-labelled lysines, it also showed substantive reactivity with several other amino-acid residues (serine, threonine, tyrosine, arginine, cysteine) across the mouse liver proteome61. These results are consistent with our initial gel-based profiling experiments studies of a similar NHS ester probe (8), which showed substantially higher overall proteomic reactivity compared to STP probe 1 (Supplementary Fig. 1).

In summary, we have described a quantitative chemical proteomic platform to globally map the reactivity and ligandability of lysine residues in the human proteome. Projecting forward, it is interesting to speculate on the broader functional ramifications of lysines that display heightened reactivity. Minimally, this feature appears to correlate well with ligandability, which could reflect the enriched presence of hyper-reactive lysines in pockets, where the pKa of these residues can be presumably altered. On the other hand, the localization of hyper-reactive lysines to pockets could also restrict their access to post-translational machinery, such as ubiquitylation processes (Fig. 2c), which may instead mostly target surface-exposed (that is, less reactive) lysines. We also believe that our studies, despite having uncovered more than 100 lysines targeted by fragment electrophiles, almost certainly still underestimate the global ligandability of lysines in the human proteome. The development and evaluation of larger compound libraries displaying more diversified recognition and amine-reactive elements, including covalent-reversible electrophiles (for example, aldehydes), in combination with surveying complementary cell types (for example, primary immune cells62 and metabolic organs63), should greatly enrich our understanding of functional and ligandable lysines in the human proteome and, through doing so, extend its druggable landscape for basic and translational research objectives.

Methods

A detailed Methods section is provided in the Supplementary Information.

Data availability

The data supporting the findings discussed here are available within the paper, its Supplementary Information and Supplementary Tables 1 and 2, as well as from the corresponding authors upon request.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.