Introduction

Covalent drugs incorporate a mildly reactive functional group that forms a covalent bond with protein targets to confer additional affinity beyond the non-covalent interactions involved in drug binding1. Historically, concerns about the interference of these reactive molecules with biological assays and potential lack of selectivity often discouraged further investigation2,3. Many early covalent drugs were discovered serendipitously and bind active sites to inhibit enzymatic activity4. These drugs often mimic a substrate transition state to enable covalent modification of a catalytic amino acid residue. Over the past 30 years, the rational design of covalent drugs has garnered increased interest, and covalently targeting non-conserved amino acids to increase selectivity has become commonplace2,5. The prolonged target engagement of covalent drugs can provide distinct pharmacodynamic profiles and exceptional potency6.

The potential benefits of covalency have inspired medicinal chemists to explore the covalent drug space despite concerns about reactivity. In many cases, compromises between reactivity, selectivity and potency have produced safe and effective drugs. Key examples that we discuss here (Fig. 1 and Table 1) include the Bruton’s tyrosine kinase (BTK) inhibitor ibrutinib (AbbVie) and the epidermal growth factor receptor (EGFR) inhibitor osimertinib (AstraZeneca), with sales totalling US$8.43 billion and $4.33 billion in 2020, respectively7,8. Moreover, potent inhibition through covalent modification has enabled targeting of traditionally ‘undruggable’ proteins, exemplified by the approval of sotorasib (Amgen), which is an inhibitor of mutant KRAS(G12C), a GTPase that resisted decades of drug discovery efforts9,10 (Fig. 1). At the same time, more traditional covalent targeting of protease active sites has continued to yield valuable drugs, such as nirmatrelvir (Pfizer), which inhibits the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease (Mpro)11 (Fig. 1).

Fig. 1: Timeline of the development of major covalent drugs.
figure 1

Each covalent drug is classified according to the drug type or type of disease it treats. Unless otherwise indicated, the date refers to the first approval by the US Food and Drug Administration. NSAID, non-steroidal anti-inflammatory drug.

Table 1 Key examples of covalent drugs

Targeted covalent inhibitors are often discovered through structure-guided design by incorporating an electrophile into a ligand that would otherwise reversibly bind the target protein. The incorporated electrophile binds irreversibly to an amino acid on the target protein, introducing a covalent interaction in addition to the reversible interactions already at play.

Covalent ligand screening is another ligand discovery approach that is becoming more common, whereby various methods are used to discover covalent ligands from libraries of electrophilic compounds7,12,13. This ‘electrophile-first’ approach is partly facilitated by the development of chemoproteomic platforms that enable rapid target identification and selectivity profiling of covalent ligands14,15,16,17,18. Combining the advances in covalent ligand screening and chemoproteomics with structural biology to empower medicinal chemistry has the potential to generate molecules that selectively bind challenging protein targets.

In this Review, we start by briefly highlighting historical examples of covalent drugs and their mechanisms of action. We then elaborate on milestones in covalent drug discovery over the past decade, categorizing our discussion on the basis of the discovery approach taken. Finally, we summarize the toolbox of emerging covalent drug discovery techniques, with emphasis on screening strategies and selectivity profiling.

History of covalent drugs

Compounds that contain protein-reactive functional groups have often been avoided in medicinal chemistry and excluded from compound screening collections owing to their potential for assay interference and off-target promiscuity. Many historical examples of covalent drugs were discovered to act through covalent mechanisms after their use was already widespread. One of the most prominent among these is the non-steroidal anti-inflammatory drug (NSAID) aspirin, which has been marketed since 1899 (ref.19) (Fig. 1). Aspirin’s mechanism of action was unknown until 1971, when it was discovered to exert its anti-inflammatory effects by acetylating Ser529 in the substrate-binding channel of cyclooxygenase 1, preventing conversion of the substrate arachidonic acid into prostaglandins20.

Early covalent drugs also tend to be derived from or inspired by natural sources. β-Lactam antibiotics such as penicillin (Fig. 1), produced by Penicillium fungi, bind to penicillin-binding proteins (PBPs), which are involved in bacterial cell wall synthesis21. All PBPs contain active-site serine residues that can be acylated by penicillin, inhibiting PBP activity and leading to cell membrane rupture21. Another covalent antibiotic is the epoxide-containing fosfomycin (Fig. 1), which is produced by some Streptomyces bacteria and acts by reacting with the catalytic cysteine of UDP-N-acetylglucosamine-enolpyruvyl transferase (MurA) to disrupt peptidoglycan synthesis and induce membrane rupture22,23,24.

Some covalent drugs are prodrugs with thiol-containing metabolites that form disulfide bonds to inactivate their targets25. The proton pump inhibitor omeprazole (Fig. 1), approved by the US Food and Drug Administration (FDA) in 1988 to treat gastro-oesophageal reflux disease, is an example of this and is also a drug that was brought to market before its mechanism of action was understood to be covalent. Both omeprazole and clopidogrel (Fig. 1), an antiplatelet medication used to prevent strokes and heart attacks, are activated by cytochrome P450 enzymes in the liver to produce bioactive thiol metabolites26.

Covalent drugs have also been historically significant in cancer therapy. The pyrimidine nucleoside analogues 5-fluorouracil27,28 and gemcitabine29 are prodrugs used to inhibit thymidylate synthase and ribonucleotide reductase I, respectively, to treat a wide range of cancers (Fig. 1). Bortezomib (Fig. 1), a dipeptide boronic acid that covalently binds to and inhibits a catalytic threonine residue of the 26S proteasome, was approved by the FDA in 2003 to treat patients with multiple myeloma30.

Covalent drugs have been used to treat a variety of diseases. However, focusing on covalency from the outset of a project, instead of discovering a covalent mechanism of action after the fact, provides opportunities to improve drug design. Recent work in this field showcases how covalent drug discovery tools present solutions to otherwise intractable drug discovery challenges.

Discoveries by ligand-first approaches

Major milestones of covalent drug discovery have been reached over the past decade, including the FDA approval of the first covalent EGFR inhibitor, afatinib, in 2013, the BTK inhibitor, ibrutinib, in 2013 and the discovery of other kinase inhibitors. To discover these compounds, mildly reactive electrophilic functional groups were incorporated into known reversible ligands to enhance the inhibition of protein function. These examples offer lessons for future programmes, as each compound must balance reactivity, potency and selectivity.

Covalent EGFR inhibitors

Overactivity of the receptor tyrosine kinase EGFR drives the progression of non-small-cell lung cancer (NSCLC), making EGFR a key drug target in oncology31. During clinical development in the early 2000s, the reversible, first-generation EGFR inhibitors gefitinib and erlotinib (Fig. 2a) were discovered to be effective against tumours harbouring somatic activating mutations in EGFR, either deletions in exon 19 or the L858R point mutation, which occur in 10–30% of patients with NSCLC31,32,33. However, the disease in these patients eventually still progressed; in 60% of cases this was due to the acquisition of the T790M ‘gatekeeper mutation34,35. This mutation of the gatekeeper residue in the ATP-binding site of EGFR not only decreases the binding affinity of many reversible inhibitors for EGFR but also increases the binding affinity of EGFR for ATP36.

Fig. 2: Progression of EGFR inhibitor structures.
figure 2

Progression from first-generation (part a) to second-generation (part b) epidermal growth factor receptor (EGFR) inhibitors involved the addition of a reactive acrylamide electrophile (highlighted in red) to covalently bind a cysteine residue (Cys797) in EGFR. In the progression from second-generation to third-generation (part c) EFGR inhibitors, the quinazoline moiety is replaced with a pyrimidine unit to provide selectivity for the T790M mutant.

To overcome this problem, covalent second-generation inhibitors were strategically designed with acrylamide Michael acceptors to react with a cysteine residue (Cys797) in EGFR (Fig. 2b). Cys797 is located adjacent to the ATP-binding site, and irreversible binding of EGFR ligands to EGFR partially restores activity against the T790M gatekeeper mutant33. In addition to modest activity against T790M, covalent second-generation inhibitors provided prolonged suppression of EGFR signalling, suggesting that these covalent EGFR inhibitors could be more efficacious than reversible first-generation inhibitors such as erlotinib33. Afatinib (Boehringer Ingelheim) (Figs. 1 and 2b) was approved by the FDA in 2013 as a first-line treatment for patients with metastatic NSCLC with activating mutations in EGFR37,38. Despite the increased potency that covalent engagement brought against the disease target, the dose-limiting toxicity caused by inhibition of wild-type EGFR likely prevented afatinib from increasing overall survival when compared head-to-head with platinum-based chemotherapy in treating cancers bearing the T790M gatekeeper mutation39. Other second-generation inhibitors include neratinib (Puma), which potently inhibits HER2 by covalently binding Cys805 (a cysteine residue homologous to Cys797 on EGFR) and was approved by the FDA for treatment of HER2+ breast cancer in 2017, and dacomitinib (Pfizer) (Fig. 2b), which was approved by the FDA to treat NSCLC in 2018 (refs.40,41,42).

A third generation of EGFR inhibitors followed afatinib; these covalent inhibitors selectively target the T790M mutant over wild-type EGFR, and include WZ4002 (Dana–Farber Cancer Institute)43, osimertinib44,45 and rociletinib (Clovis Oncology; CO-1686)46 (Fig. 2c). These compounds maintain the acrylamide group to covalently bind Cys797 but exchange the quinazoline moiety of first-generation and second-generation compounds for a pyrimidine to promote selectivity for mutant EGFR(T790M) (ref.47). Higher affinity for T790M over wild-type EGFR not only results in efficacy in cancers with the EGFR gatekeeper mutation but also contributes to an improved safety profile and enables a higher recommended dose for osimertinib than for afatinib48. Osimertinib was granted accelerated approval by the FDA in 2015 as a second-line treatment for NSCLC, and was approved as a first-line treatment for metastatic NSCLC in 2018 (ref.49). However, osimertinib depends on Cys797 for covalent binding, and C797X mutations account for 15% of cases of resistance to second-line osimertinib50,51,52. Generally, drugs whose efficacy relies on covalent binding to a specific nucleophilic amino acid are vulnerable to mutations at that site, which could lead to drug resistance.

The success of covalent EGFR inhibitors has validated the approach of covalently engaging non-catalytic, non-conserved cysteines adjacent to kinase active sites to increase the potency and modulate the pharmacodynamics of initially reversible ligands. Development of these drugs has shown that the acrylamide electrophile is reactive enough to engage a cysteine adjacent to an ATP-binding site but not so reactive as to induce haptenization and an adverse immune response. Incorporating covalent binding in EGFR inhibitors also enables selectivity between kinases through an interaction with a non-conserved cysteine instead of highly conserved active-site residues that typically interact with ATP.

Covalent BTK inhibitors

The discovery of covalent BTK inhibitors shares several themes with the discovery of covalent EGFR inhibitors, including the ligand-first approach and the use of Michael acceptor electrophiles. BTK became a target of interest in chronic lymphocytic leukaemia owing to its crucial role downstream of the B cell receptor53. Activation of the B cell receptor induces phosphorylation of BTK through Lyn and Syk kinases, and eventually activates transcription factors related to B cell proliferation, differentiation, cell migration and adhesion54. This key role in B cell development indicated that BTK was a relevant target for B cell malignancies.

In the early 2000s, scientists at Celera Genomics who were interested in using BTK inhibitors to treat rheumatoid arthritis used a structure-based approach to discover an acrylamide-containing inhibitor of the BTK kinase domain that could be used as a tool compound to fluorescently label BTK55. It was subsequently discovered that the tool compound itself, later named ibrutinib (Table 1), had sufficient activity and suitable physicochemical properties to advance into clinical studies56,57,58. Ibrutinib was approved by the FDA for the treatment of mantle-cell lymphoma in 2013 and subsequently for chronic lymphocytic leukaemia (CLL), Waldenstrom’s macroglobulinaemia and chronic graft versus host disease59,60,61,62,63.

Similarly to EGFR inhibitors, ibrutinib binds to a cysteine residue (Cys481) adjacent to the ATP-binding site in BTK, and because only a few kinases have a homologous cysteine, ibrutinib should exhibit a degree of selectivity for BTK over other kinases64. The rapid clearance of ibrutinib (which has a half-life of 2–3 h) could also enable kinase selectivity; ibrutinib should maintain activity against BTK owing to prolonged covalent engagement, while the reversible inhibition of off-targets is minimized65. This combination of fast covalent engagement of BTK with rapid clearance might allow for selectivity in vivo despite the off-target kinase inhibition observed in biochemical assays.

Several other covalent BTK inhibitors have been approved or are currently in clinical trials and some of these highlight the variety of Michael acceptors that can be used as alternative electrophiles to acrylamides66,67,68. Most prominent among these is acalabrutinib (AstraZeneca), approved by the FDA in 2019 to treat CLL, which contains a butyramide electrophile instead of an acrylamide69. The butyramide electrophile is less reactive than an acrylamide, which, in addition to other substitutions, is hypothesized to account for the superior selectivity of acalabrutinib compared with ibrutinib for BTK and could be responsible for the reduced number of adverse cardiovascular events70,71. Further work has examined the use of cyanoacrylamides as electrophiles to design reversible covalent BTK inhibitors, which would ideally show increased potency and lower covalent off-target reactivity than irreversible covalent BTK inhibitors72,73,74. The long, tunable off-rates of reversible covalent inhibitors highlights the grey area that exists between reversible inhibition and irreversible covalent mechanisms.

Overcoming historical concerns relating to the potential toxicity of covalent drugs, the success of ibrutinib demonstrates that rationally designed covalent drugs can achieve acceptable safety profiles and blockbuster status. Ibrutinib, and covalent EGFR inhibitors, demonstrate that kinase inhibitors that target non-conserved cysteines adjacent to the ATP-binding site can be developed into selective and potent drugs. The pharmacokinetic and pharmacodynamic properties of ibrutinib allow for prolonged BTK blockade while reducing off-target kinase inhibition through rapid clearance in vivo. Notably, the performance of ibrutinib in treating B cell malignancies emphasizes that molecules once considered chemical biology tool compounds can become effective drugs.

Other covalent kinase inhibitors

Covalent inhibitors have been used to selectively target kinases other than EGFR and BTK with non-conserved cysteine residues adjacent to their ATP-binding sites75,76. One example is Janus kinase 3 (JAK3), a non-receptor tyrosine kinase primarily expressed in leukocytes and involved in cytokine signalling77. Covalent targeting of the non-conserved Cys909 of JAK3 has yielded inhibitors selective for JAK3 over other JAK family members for the treatment of autoimmune diseases77,78,79,80,81. One of these inhibitors, ritlecitinib (Pfizer; PF-06651600), has shown promising results for patients with rheumatoid arthritis in a phase II clinical trial82. Several covalent inhibitors of fibroblast growth factor receptor 4 (FGFR4) target the non-conserved Cys552 residue in FGFR4 to confer selectivity over FGFR1, FGFR2 and FGFR3, as well as to overcome mutations that confer resistance to reversible FGFR inhibitors in hepatocellular carcinoma (HCC)83,84. The acrylamide-containing FGFR4 inhibitor fisogatinib (Blueprint Medicines; BLU-554) is currently the subject of a phase II clinical trial (NCT04194801). Aldehyde-containing roblitinib (Novartis; FGF401), which is a reversible covalent FGFR4 inhibitor that also reacts with Cys552, is also under clinical investigation (NCT02325739)83,85. Overall, the rational design of covalent kinase inhibitors that target non-conserved cysteines adjacent to the ATP-binding site has become a routine approach to enhancing the potency and selectivity of kinase inhibitors.

Discoveries by electrophile-first approaches

Covalent drugs are also discovered through electrophile-first approaches, meaning that the initial discovery process is rooted in finding a covalent ligand from the outset, instead of incorporating covalency into a known reversible ligand. Key examples of drugs discovered through this approach include the KRAS(G12C) inhibitor sotorasib and the SARS-CoV-2 Mpro inhibitor nirmatrelvir (Fig. 1).

Covalent KRAS(G12C) inhibitors

The discovery and development of covalent KRAS(G12C) inhibitors is one of the most exciting discovery-to-clinic stories featuring covalent drugs. KRAS is a GTPase-encoding oncogene that is mutated in about 25% of all cancers, most notably in pancreatic, colorectal and lung cancers86. Wild-type KRAS is carefully regulated between the active GTP-bound state and inactive GDP-bound state, but many KRAS mutations attenuate GTPase activity, leading to low rates of GTP hydrolysis and elevated RAS signalling, driving tumorigenesis87. Since the discovery of the role of KRAS in cancer nearly 30 years ago, attempts to drug it directly using traditional drug discovery methods have been unsuccessful9,10. KRAS does not have accessible pockets for reversible inhibitors to bind to, competitive inhibitors would need to overcome the picomolar binding affinities of GTP and GDP, and inhibitors active against wild-type KRAS could show on-target toxicity86,88.

Covalent KRAS inhibitors against the G12C mutant are appealing for several reasons. First, targeting mutant KRAS could allow for selective cytotoxicity to cancer cells. Second, the affinity enabled by covalent binding would be advantageous as KRAS lacks easily ligandable pockets. Third, 12–14% of KRAS mutations in NSCLC are KRASG12C, presenting a promising patient group that would directly benefit from KRAS(G12C) inhibition89. Finally, position 12 in KRAS sits closely beneath the effector-binding region and the nucleotide-binding pocket, suggesting that covalent KRAS(G12C) ligands might affect KRAS function87.

In 2013, researchers at the University of California, San Francisco reported the first mutant-selective covalent KRAS(G12C) inhibitor. The inhibitor (compound 12 in their study) was discovered through a disulfide-fragment screening approach known as tethering, whereby a library of 480 disulfide fragments was screened against KRAS(G12C) in the GDP-bound state using intact protein mass spectrometry (MS)88,90. Co-crystal structures of KRAS(G12C) showed that hit compounds bound to the switch II region, and subsequent medicinal chemistry efforts to exchange the disulfide moiety for acrylamide and vinyl sulfonamide electrophiles yielded KRAS(G12C) inhibitors that were active in vitro, including compound 12. Binding of compound 12 to the switch II pocket impaired KRAS signalling by shifting nucleotide affinity from favouring GTP to GDP and led to the accumulation of KRAS in its inactive state91.

This novel mechanism for selective KRAS(G12C) inhibition set the stage for the development of clinical covalent KRAS(G12C) inhibitors. In 2016, Wellspring Biosciences disclosed ARS-853, which is a selective covalent inhibitor of KRAS(G12C) with in cellulo efficacy in the low micromolar range92. Structure-guided optimization of compound 12 and use of a cellular liquid chromatography with tandem mass spectrometry (LC–MS/MS)-based assay to determine the degree of KRAS(G12C) engagement in H358 cells, yielded ARS-853 (ref.92). ARS-853 treatment in KRAS(G12C)-dependent cell lines decreased the amount of active KRAS(G12C), inhibited downstream RAS signalling and induced apoptosis92. Although KRAS(G12C) had been thought to be constitutively active, the selective binding of ARS-853 to GDP-bound, inactive KRAS(G12C) provided evidence that KRAS mutants cycle between GTP-bound and GDP-bound states92.

The discovery of clinical KRAS(G12C) inhibitors continued with ARS-1620, which was the result of an effort to overcome metabolic stability and bioavailability limitations of ARS-853 to facilitate in vivo studies of KRAS(G12C) inhibition87. ARS-1620 is based on a novel quinazoline core scaffold, designed to better occupy the switch II pocket and, thus rigidify a more favourable conformation for covalent reaction between the acrylamide electrophile and cysteine87. Ultimately, ARS-1620 was identified as the first KRAS(G12C) inhibitor suitable for in vivo studies and showed efficacy in KRAS(G12C) patient-derived xenograft models treated at 200 mg kg−1 once per day or twice per day87. The increased potency of this series of KRAS(G12C) inhibitors and success in in vivo models indicated that it might be possible to design clinically efficacious drugs.

Sotorasib (AMG-510) (Table 1) was the first selective KRAS(G12C) inhibitor to enter clinical trials in 2018 and was developed by Amgen, building on discoveries from a partnership with Carmot Therapeutics in which a custom library of small molecules that covalently bind cysteine were screened against KRAS(G12C)93. Molecules identified through this collaboration led to the discovery of a previously unknown pocket on KRAS (a cryptic pocket), which Amgen scientists exploited to discover sotorasib through structure-based design94. Sotorasib was designed to occupy the cryptic pocket by interacting with His95, Tyr96 and Gln99 (ref.94) (Fig. 3). A phase II clinical trial investigating sotorasib was successfully completed in 2020, and was followed by FDA approval for the treatment of adults with KRASG12C-mutated locally advanced or metastatic NSCLC in May 2021 (ref.95). Other covalent KRAS(G12C) inhibitors are entering clinical trials. Adagrasib (MRTX849) emerged from a joint drug discovery collaboration between Mirati Therapeutics and Array BioPharma, in which irreversible covalent inhibitors of KRAS(G12C) were identified; Mirati Therapeutics subsequently used structure-based design approaches to optimize adagrasib, which entered clinical trials in January 2019 (refs.96,97) (Fig. 3). JNJ-74699157 (ARS-3248; J&J and Wellspring Biosciences) was being investigated in patients with several types of advanced solid tumour that express KRASG12C, including NSCLC and colorectal cancer, but its clinical trials have been terminated98.

Fig. 3: Aligned structures of KRAS(G12C) co-crystallized with adagrasib (MRTX849) and sotorasib (AMG-510).
figure 3

The covalent inhibitors adagrasib (PDB ID: 6UT0) and sotorasib (PDB ID: 6OIM) are bound to the switch II pocket, which is adjacent to the GDP-binding pocket93,96.

Designing small-molecule covalent KRAS(G12C)-selective inhibitors provides an elegant solution to drugging an undruggable cancer target. Before KRAS(G12C) inhibitors, recently discovered targeted covalent inhibitors in oncology were mostly identified using ligand-first approaches. The success of covalent KRAS(G12C) inhibitors validates an electrophile-first approach to covalent drug discovery and affirms the importance of covalent fragment screening techniques (discussed below). Optimization of initial hit compounds that emerge from covalent screening platforms, such as compound 12, can subsequently lead to programmes that produce potent covalent inhibitors such as sotorasib. In addition, the sotorasib story suggests that in other diseases in which a key protein target undergoes substitution of an amino acid to a cysteine residue, covalent inhibitors present an increasingly validated method to potentially provide precision therapy for patients.

SARS-CoV-2 main protease inhibitors

Vaccines against coronavirus disease 2019 (COVID-19) were developed at unprecedented speeds, and similar research momentum has led to the development of therapeutics that will benefit patients with COVID-19. In December 2021, the FDA issued an Emergency Use Authorization for Pfizer’s Paxlovid (a combination of nirmatrelvir and ritonavir) to treat mild-to-moderate COVID-19 (caused by SARS-CoV-2) in adults and some paediatric patients, marking the first approved oral treatment for the disease99. Nirmatrelvir (Table 1) covalently inhibits the Mpro of SARS-CoV-2 (ref.3). This programme highlights how the adaptation of relatively inactive peptidomimetics into potent and selective cysteine protease inhibitors can be accomplished by the addition of cysteine-reactive covalent functional groups to target the protease active site and by structurally informed medicinal chemistry efforts.

SARS-CoV-2 is a virus with a single-stranded RNA genome that encodes two polyproteins (pp1a and pp1ab) as well as structural and accessory proteins100. Viral replication depends on successful cleavage of pp1a and pp1ab by the Mpro (also referred to as 3CLpro), which is a cysteine protease, into functional viral proteins100. The discovery of covalent inhibitors against SARS-CoV-2 Mpro emerged from extensive work on protease inhibitors for SARS-CoV-1, which is the causative virus for severe acute respiratory syndrome coronavirus 1 (SARS1)100. During the 2002–2003 SARS1 outbreak, researchers used a crystal structure of the homologous porcine transmissible gastroenteritis coronavirus (TGEV) Mpro bound to a hexapeptidyl chloromethylketone covalent cysteine-reactive inhibitor to provide a foundation for the design of covalent inhibitors against SARS-CoV-1 Mpro (ref.101). Because of the homology across all Mpro, this work enabled the discovery of rupintrivir (AG7088), which is a mechanism-based inhibitor of the human rhinovirus (HRV) Mpro (ref.101).

Because the SARS-CoV-1 outbreak subsided, work into developing coronavirus Mpro inhibitors slowed until the emergence of SARS-CoV-2 in 2019. SARS-CoV-2 Mpro shares 96% sequence identity with SARS-CoV-1 Mpro, and there is 100% sequence overlap of the catalytic sites102. Renewed interest in improving on previous chloromethylketone inhibitors motivated researchers to adapt rupintrivir into a SARS-CoV-2 Mpro inhibitor potent enough to obtain a co-crystal structure. This discovery in turn enabled identification of the α-hydroxymethylketone-containing antiviral PF-00835231, which demonstrated potent SARS-CoV-2 Mpro inhibition in an activity assay based on fluorescence resonance energy transfer (FRET), activity in antiviral cell-based assays, stability in plasma and low clearance in vivo103.

In subsequent studies, the oral bioavailability of PF-00835231 was improved by replacing the α-hydroxymethylketone moiety with a nitrile group, which can also act as an electrophile103. Nitriles can covalently bind particularly reactive nucleophiles; however, the ease of thiol elimination from the thioimidate adduct makes nitriles more reversible than some other electrophiles, such as acrylamides104. Optimization from PF-00835231 eventually yielded PF-07321332, named nirmatrelvir (Fig. 4 and Table 1), which is a highly potent SARS-CoV-2 reversible covalent inhibitor that displayed potent inhibition in a FRET-based assay across all human coronaviruses, while no inhibitory effects were seen against human cysteine or serine proteases. The in vivo efficacy of nirmatrelvir was demonstrated in a mouse-adapted SARS-CoV-2 (SARS-CoV-2 MA10) model.

Fig. 4: Nirmatrelvir in complex with SARS-CoV-2 main protease.
figure 4

The nitrile group of nirmatrelvir (shown in green) reacts with Cys145 of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease (shown in light blue) to form a covalent thioimidate adduct. Extensive hydrogen-bond interactions (yellow dashed lines) occur throughout the pocket (PDB ID: 7RFW)103.

In phase II/III clinical trial data released in November 2021, Paxlovid was shown to be highly effective at preventing progression to severe COVID-19 in symptomatic patients105. The emergence of this orally bioavailable drug for COVID-19 will help to ameliorate illness for non-hospitalized patients in high-risk groups106. Overall, SARS-CoV-2 Mpro covalent inhibitors provide a promising avenue for the treatment of coronavirus infections either as monotherapies or in combination with other antiviral drugs.

The quick adaptation of previous protease inhibitors to selectively target SARS-CoV-2 Mpro is an example of using structure-based design while taking into account the valuable properties of covalent drugs. Researchers started with an electrophile-containing peptide and optimized both the peptide and electrophile to obtain a highly potent covalent inhibitor. In some ways, this story mirrors that of the discovery and optimization of JAK3 kinase inhibitors, in which EGFR inhibitors were adapted to target Cys909 of JAK3 (ref.79). Both JAK3 and mutant EGFR(T790M) contain a methionine gatekeeper, as well as a homologous cysteine adjacent to the ATP-binding site. These shared features enabled the discovery of potent JAK3-selective covalent inhibitors using an EGFR inhibitor as a starting point, similar to how rupintrivir provided the starting structure for the discovery of nirmatrelvir. Generally, when reactive cysteine residues are shared across a protein family, structurally guided adaptation of previously studied cysteine-reactive covalent inhibitors can lead to selective and potent drugs for other proteins with this feature.

HCV NS3/4a protease inhibitors

α-Ketoamide-based covalent inhibitors have been developed to treat hepatitis C virus (HCV) infection through inhibiting NS3/4a — a serine protease that cleaves the HCV polyprotein into multiple non-structural proteins required for replication107. Although HCV had been treated with a combination of PEGylated interferon-α and ribavirin, modest response rates and notable adverse events prompted studies that resulted in the discovery of NS3/4a protease inhibitors to treat HCV108. On the basis of initial observations that hexapeptide cleavage products could inhibit NS3/4a, the linear peptidomimetic inhibitors boceprevir (Merck)109,110 (Fig. 1) and telaprevir (Vertex)111,112 were designed. These compounds, along with narlaprevir113, use a ketoamide electrophile to covalently engage the catalytic serine of NS3/4a. This covalent interaction is relatively reversible owing to elimination of the serine alcohol group from the protein–inhibitor adduct114. Boceprevir and telaprevir were effective in treating HCV and were approved by the FDA in 2011 after successful trials115,116. However, telaprevir was withdrawn in 2014 owing to adverse events, and boceprevir was discontinued by Merck in 2015 owing to the superiority of newer direct-acting antivirals, in particular the ledipasvir–sofosbuvir combination (Gilead), which targets the HCV polymerases NS5a and NS5b117,118,119. Nevertheless, the success of ketoamide-based NS3/4a inhibitors in increasing the efficacy of interferon–ribavirin therapy emphasizes the utility of the ketoamide group as a serine-reactive electrophile in designing covalent antivirals.

Covalent proteasome inhibitors

Bortezomib (Takeda) was the first boron-containing drug to be approved by the FDA, and was approved for treatment of multiple myeloma in 2003 (ref.120). Bortezomib was discovered through optimization of an aldehyde-containing proteasome substrate peptide, and set a precedent for the discovery of HCV NS3/4a inhibitors such as telaprevir121,122,123,124. Exchange of the aldehyde electrophile for a boronic acid substantially increased the potency of proteasome inhibition121. The effect of bortezomib comes from the boronic acid covalently binding to the hydroxy group of the β5 subunit N-terminal threonine of the 20S proteasome, leading to inhibition of the proteasome’s chymotrypsin-like activity125. Bortezomib takes advantage of the increased sensitivity of haematological cancers such as multiple myeloma and mantle-cell lymphoma to proteasome inhibition123.

The development of bortezomib validated the proteasome as a cancer target, encouraging the discovery of other proteasome inhibitors. Medicinal chemistry efforts transformed the natural product epoxomicin into carfilzomib (Amgen), which was approved by the FDA in 2012 (ref.126) (Fig. 1). The epoxyketone moiety in carfilzomib forms a morpholino ring with the catalytic N-terminal threonine of the 20S proteasome, and this mechanism has been proposed to confer selectivity because most proteases do not have nucleophilic side chains at their N terminus127. Although the epoxyketone could have additional off-targets that bortezomib does not, this proposed mechanism of carfilzomib illustrates how covalency can help to drive selectivity through highly specific mechanisms.

Further work has been done to identify orally bioavailable proteasome inhibitors to improve upon bortezomib, which is administered intravenously128. Ixazomib (Takeda) is a second-generation proteasome inhibitor that also contains a boronic acid group129. This drug is administered orally as the prodrug ixazomib citrate, a boronic ester that hydrolyses upon exposure to aqueous media or plasma130,131. Oprozomib (Amgen), a second-generation epoxyketone-containing proteasome inhibitor, can also be orally administered132.

The discovery of these covalent proteasome inhibitors illustrates the utility of boronic acid electrophiles for targeting protease active sites and demonstrates how a covalent drug discovery project can use an unoptimized electrophile as a starting point and subsequently introduce an alternative electrophile. Just as a chloromethylketone group in a SARS-CoV-2 ligand was optimized to the nitrile in nirmatrelvir, bortezomib was discovered through optimization from an aldehyde. Although targeting especially nucleophilic protease active sites might provide greater flexibility in terms of electrophile choice, using highly reactive electrophiles to gain an initial foothold can be beneficial.

Other boron-containing drugs

Several additional boron-containing drugs inhibit serine hydrolases beyond the proteasome through formation of covalent adducts between catalytic serine residues and boron133,134. Most of these drugs contain benzoxaborole groups and have been discovered through screening of compound collections that contain boron-based electrophiles135. For example, the antifungal tavaborole (Pfizer) was discovered through focused screening of boron-containing compounds previously investigated as antibacterials, and was approved by the FDA to treat onychomycosis in 2014 (refs.136,137). Tavaborole covalently binds to the 2′ and 3′ hydroxy groups on the 3′ terminus of leucyl-tRNA, trapping the tRNA–tavaborole adduct in the editing site of leucyl-tRNA synthetase to block protein synthesis138. Crisaborole (Pfizer), a phosphodiesterase 4 inhibitor, was approved by the FDA to treat psoriasis in 2016, and the β-lactamase inhibitor vaborbactam (Rempex) was approved to treat various bacterial infections in 2017 (refs.139,140). These boron-based drugs target serine hydrolases, and the weak boron–sulfur bond potentially provides selectivity for serine over cysteine hydrolases121. The reversibility of the serine–boron bond hinders chemoproteomic profiling experiments commonly used to characterize the selectivity of covalent ligands. But as the serine hydrolase family is rich with potential drug targets, the numerous boron-containing drugs in clinical trials hold promise across a wide variety of disease types141.

Mutant-haemoglobin modulators

Voxelotor (Global Blood Therapeutics) is a lysine-targeting covalent drug used to treat sickle cell anaemia, and the discovery of voxelotor (Table 1) was dependent on knowledge of heightened lysine side chain reactivity. Sickle cell anaemia is caused by a single mutation in the gene encoding the β-haemoglobin chain that induces polymerization of mutant haemoglobin (HbS) under hypoxic conditions142. An aldehyde-containing natural product and several synthetic aldehyde analogues were found to prevent polymerization by increasing the affinity of HbS for oxygen143. These aldehydes bind in a reversible covalent manner, forming a Schiff base with the N-terminal valine of the α-haemoglobin chain143. Almost 50 years ago, this N-terminal amine was discovered to have a particularly low pKa of 6.9, indicating that it is primarily unprotonated under physiological conditions and, thus, more nucleophilic144. Based on earlier aldehydes, voxelotor was discovered through a structure-guided effort to discover compounds that increase the oxygen affinity of HbS, and was designed to bind the HbS tetramer in a 1:1 stoichiometry, unlike the 2:1 ratio of earlier compounds143,145. With a remarkable red blood cell to plasma ratio of ~150 that likely reduces off-target effects, voxelotor was approved by the FDA in 2019 with a recommended dose of 1.5 g daily, which is an unusually high dose for a covalent drug142,146. The success of voxelotor is similar to that of ibrutinib in demonstrating that covalent drugs can be dosed at high amounts given favourable absorption, distribution, metabolism and excretion (ADME) properties. Furthermore, the discovery of voxelotor shows how the identification of unusually reactive amino acid residues, such as the α-haemoglobin N-terminus, provides opportunities for drug discovery.

The covalent drug discovery toolbox

Many covalent drug discovery programmes, including those for covalent EGFR and BTK inhibitors, have involved the addition of reactive functional groups to previously identified ligands. However, emerging technologies make it possible to approach covalent ligand discovery from an electrophile-first perspective, in which covalent ligands against protein targets of interest are identified before structure-based optimization. For example, this type of approach proved successful in drugging KRAS(G12C) and has facilitated the rapid discovery of E3 ligase ligands for targeted protein degradation applications. Activity-based protein profiling (ABPP) approaches have transformed the characterization of electrophilic compounds, facilitating selectivity profiling and target identification experiments. Special considerations must also be made in evaluating the binding affinity and reactivity of covalent ligands (Box 1). In this section, we discuss the toolbox of emerging techniques that covalent drug discovery relies on.

Screening platforms

Many emerging covalent ligand screening platforms involve MS-based detection, enabled by covalent bond formation between the compound and protein. Other phenotypic, DNA-encoded or computational approaches have also been used, which are then paired with MS-based validation, ABPP-based experiments to inform selectivity and structural biology to enable medicinal chemistry. The most prominent methods for covalent screening are summarized in Table 2. The growth of commercial libraries of electrophilic fragment-like compounds is a key factor contributing to the rise of these electrophile-first discovery strategies.

Table 2 Comparison of screening methods for covalent drug discovery

Intact protein MS

Originally, MS-based compound screening grew out of ‘tethering’, which is a technique employed since 2000 that uses libraries of compounds linked to disulfides to identify fragments that bind cysteine-adjacent pockets90,147. Molecules that bind undergo disulfide exchange with the cysteine (which could be endogenous or engineered) to form an adduct with the protein, and pooled screening with MS detection can identify the bound compound. Binding fragments can be combined or grown in a fragment-based approach to identify high-affinity ligands. This strategy was originally designed to identify reversible ligands for challenging targets, but covalent binding can be maintained by replacing the disulfide with an electrophile such as an acrylamide, as in the case of the initial covalent KRAS(G12C) inhibitors88. This approach has been used to discover compounds that modulate protein–protein interactions148,149, and a more recent study employed a similar tethering strategy that uses aldehydes to form imines with lysine residues150.

Over the past decade, covalent ligand discovery has shifted towards screening more drug-like electrophilic fragments13. In 2012, researchers curated 177 electrophilic compounds from the Pfizer compound collection and used an MS-based primary screening strategy to identify covalent inhibitors of the interaction between hypoxia-inducible factor 1α (HIF1α) and aryl hydrocarbon receptor nuclear translocator (ARNT)151. This approach was informed by determining an X-ray crystal structure of the HIF1α–ARNT complex151. Around the same time, the concept of tethering was expanded by using a small set of acrylamides and an MS-based assay to identify thymidylate synthase inhibitors, an approach termed kinetic template-guided tethering152. Building on these studies, an acrylate functionality was appended to 100 fragments to identify non-peptidic inhibitors of the cysteine protease papain153. Electrophilic fragments were pooled, and screening through electrospray ionization (ESI) MS led to the identification of hit compounds out of pooled experiments. Use of a similar library of acrylates enabled the discovery of covalent inhibitors for the HECT E3 ligase NEDD4-1 (ref.154). As with the HIF1α–ARNT protein–protein interaction inhibitors discovered at Pfizer, co-crystal structures were crucial to understanding the mechanism of these NEDD4-1 inhibitors, which prevent association of ubiquitin with the E3 ligase and thus induce a switch from a processive to a distributive mechanism. In another example, acrylate-based inhibitors of the RBR E3 HOIP were also discovered using MS-based screening, highlighting how covalent fragment screening approaches can be useful for protein classes that have been challenging to discover ligands against, such as E3 ligases155.

More recently, a commercial library of 993 acrylamides and chloroacetamides was screened using intact MS to identify ligands of the deubiquitinase OTUB2 and the pyrophosphatase NUDT7 (ref.156). The authors used co-crystal structures with OTUB2 and NUDT7 in complex with the hit compounds to inform fragment growing to increase potency. Although previous studies had paired MS-based screening with structural information to identify cysteine residues in functional sites or to understand the mechanism of inhibition, in this case, the pairing of MS-based screening and structure-guided fragment-based drug discovery supported optimization of the potency of the hit compounds.

The same compound collection was also used to screen against the peptidyl-proline cis–trans isomerase Pin1, which is overexpressed or activated in several tumour types but has been challenging to target selectively157. The resulting chloroacetamide sulfopin was shown to be selective for Pin1 in a covalent inhibitor target-site identification (CITe-Id) chemoproteomics experiment and was effective in regressing neuroblastoma growth in mice157. This result suggests that although chloroacetamides have disadvantages, including rapid metabolism, they can be valuable tool compounds with which to assess target relevance in various disease models.

One particularly powerful example of covalent ligand screening is the discovery of the initial compounds in the series that led to the first approved KRAS(G12C) inhibitor, sotorasib. A library of 3,300 acrylamides was screened in three assays: a thiol reactivity assay, a RAF-coupled nucleotide exchange assay and an intact MS assay158. Combined with crystallographic data that showed how ligand binding revealed the presence of previously closed sub-pockets, this effort provided the basis for the rapid discovery of KRAS(G12C) inhibitors discussed above.

Covalent DNA-encoded libraries

DNA-encoded libraries (DELs) present an alternative approach that enables the screening of massive libraries of small molecules. Unlike MS or ABPP approaches, there is no specialized advantage of covalency in enabling DEL screening, but the throughput of DELs allows for the screening of much larger covalent libraries through a workflow of immobilization, enrichment, amplification and sequencing. The first reports of electrophilic protein–nucleic acid-encoded libraries described the targeting of protease active sites159, but over the past 5 years, cysteine-targeted DNA-encoded or protein–nucleic acid-encoded libraries have been used to identify covalent ligands for bromodomains, including PCAF and BRD4, as well as for JNK1, MEK2 and HER2 (refs.160,161,162). Further work has explored improvements in the enrichment step of the covalent DEL screening workflow, which differs because covalent engagement prevents elution of DNA-tagged molecules163. A covalent ligand of mitogen-activated protein kinase kinase 6 (MAP2K6) was also identified serendipitously through screening of a DEL against a DNA-encoded protein library164. More recently, even larger covalent DELs (with approximately 100,000,000 members) have been developed and used to identify acrylamide-based and epoxide-based BTK inhibitors with novel scaffolds165. In this study, similar screening results were obtained after storing the library at –80 °C for several years, suggesting that the electrophilic compounds are sufficiently stable in this context. Expansion of the covalent DEL library size and the increasing commercial availability of DELs represent an exciting development, and although unique considerations with respect to enrichment workflow and compound stability must be kept in mind, DELs that contain electrophilic molecules may become more widespread in covalent ligand discovery.

Covalent docking

The advantages of various covalent docking methods in different covalent docking scenarios have been described elsewhere166,167. Most computational programs for covalent docking rely on directly linking models to model conformations under the constraint of a predefined bond between a ligand and a corresponding amino acid site. The covalent docking platform GOLD relies on this assumption, and an example of its use was in the virtual screening for covalent inhibitors of the NEDD8-activating enzyme (NAE), in which three of the hits were confirmed as novel NAE inhibitors168. Development of the DOCKovalent method for virtual screening facilitated the discovery of boronic acid AmpC β-lactamase inhibitors, as well as cyanoacrylamide inhibitors of JAK3 (ref.169). DOCKovalent uses non-covalent docking methods to pre-generate conformations and states for an electrophilic virtual library and then samples each state against the target nucleophile. The same method has been used to identify new covalent inhibitors of the kinase MKK7 (ref.170), as well as compounds that bind KRAS(G12C) to destabilize the protein and accelerate nucleic acid exchange171.

Apart from screening, covalent docking is a useful tool for investigating binding modes of known covalent ligands. For example, GOLD has been used to model the binding modes of the aldehyde-containing proteasome inhibitor MG132 (ref.172). AutoDock uses a flexible side chain approach, whereby the covalent ligand is treated as an amino acid side chain and poses of this flexible ‘side chain’ are scored, using a physics-based scoring function that evaluates the energetics of ligand–protein interactions, as the remainder of the protein is held rigid166. Another method, called CovDock, is based on the Schrӧdinger Glide docking algorithm and Prime structure refinement methodology. CovDock uses traditional non-covalent docking approaches to dock a ligand to a protein target, and then models the covalent bond attachment and refines the complex166. This approach does not consider the reactivity of the electrophile, which can limit the ability to virtually study the differences in docking ligands with different electrophilic functional groups167.

Overall, covalent docking software provides useful information on ligand–protein interactions when key assumptions can be made — namely, when the reactivity of the electrophile and the site of modification is known.

Chemoproteomics-enabled discovery

Chemoproteomic platforms enable the identification of covalent compounds and their corresponding ligandable sites on target proteins directly in complex biological systems. Advances in chemoproteomics have facilitated the discovery of covalent ligands against undruggable disease targets and enabled the selectivity profiling of covalent ligands across the proteome to identify targets and off-targets of these ligands. Here, we discuss the chemoproteomics profiling of reactive ligandable hotspots, ABPP screening platforms and target identification within the context of recent work relevant to drug discovery. Chemoproteomics experiments can provide key information on ligand selectivity and give early guidance for selecting targets for covalent drug discovery programmes.

Chemoproteomics profiling of reactive ligandable hotspots

ABPP facilitates the discovery of covalently ligandable sites and the corresponding ligands in complex biological samples. ABPP was pioneered by Cravatt and Bogyo using active-site-directed chemical probes that covalently target catalytic residues of various enzyme classes, including hydrolases, proteases and kinases173,174. This technique, often using gel-based assays, was employed to gain functional readouts of active enzymes in biological contexts14,173,175. ABPP probes contain a warhead that covalently reacts with nucleophilic amino acids (such as cysteine) and a reporter handle to monitor probe binding, such as a fluorophore, biotin or alkyne moiety for subsequent click chemistry-enabled applications173 (Fig. 5a).

Fig. 5: Isotopic tandem orthogonal proteolysis–activity-based protein profiling.
figure 5

a | Example of a reactive probe for activity-based protein profiling (ABPP) designed with a broadly reactive electrophilic warhead linked to an analytical handle. b | Schematic of the competitive isoTOP-ABPP (isotopic tandem orthogonal proteolysis–activity-based protein profiling) methodology. Treatment of cells or lysate with a protein-reactive compound prevents subsequent binding of the pan-reactive probe, and this competitive ligand binding can be detected by tandem mass spectrometry (MS/MS) after an enrichment step, to indirectly identify covalent protein targets for the compound of interest. TEV, tobacco etch virus.

Instead of focusing on active sites, more recent ABPP approaches use MS and broadly reactive chemical probes to also map allosteric sites176. In the first reports of the isoTOP-ABPP (isotopic tandem orthogonal proteolysis–activity-based protein profiling) approach, an iodoacetamide probe functionalized with an alkyne handle was used to identify hyper-reactive cysteines across the proteome176,177 (Fig. 5b). The alkyne handle can be used to link the probe-modified protein to a tobacco etch virus (TEV) protease-cleavable tag that contains an azide group and biotin moiety separated by either an isotopically light or heavy valine (Fig. 5b). These functionalities enable enrichment of probe-modified peptides and tandem analysis of the light and heavy samples with MS, which, after controlling for run-to-run variability, allows for quantitative comparisons between samples, including competitive analysis of covalent compounds (Fig. 5b). Using this approach, it was discovered that the hyper-reactivity of cysteines predicts their functionality in catalysis and at sites of post-translational modifications177. Recent adaptations of isoTOP-ABPP have been developed to increase the coverage of cysteines across the proteome and to increase the throughput. For example, optimization of the sample preparation steps (namely single-pot, solid-phase-enhanced sample preparation (SP3)) and combination of this workflow with off-line fractionation and field asymmetrical waveform ion mobility spectrometry (FAIMS) allow for additional separation before MS detection, and enabled the identification of more than 30,000 reactive cysteines across a panel of tumour cell lines178. Additionally, as new probes are developed for other nucleophilic amino acids, reactivity profiling of other amino acid hotspots (such as lysines) across the proteome will allow for the expansion of this technology beyond cysteine.

Overall, reactivity profiling generates large quantities of information across thousands of proteins, ultimately providing a relatively unbiased picture of nucleophilic (usually cysteine) amino acid reactivity. This information can be used to either select for appropriate protein targets in drug discovery programmes or to identify allosteric sites on proteins of interest that may have previously been considered un-ligandable or undruggable. As an example of targeting a traditionally undruggable protein with a covalent molecule, ABPP was paired with a MYC transcription factor activity assay to identify a covalent MYC ligand, EN4 (ref.179). EN4 targets Cys171, which is located within a predicted intrinsically disordered region of MYC, and showed selectivity on a proteome-wide scale in profiling of more than 1,500 cysteines using competitive isoTOP-ABPP. Cys171 was initially identified as a ligandable hotspot on MYC through analysis of compiled cysteine-reactive chemoproteomics data, and this information spurred the subsequent search for a selective covalent ligand against that cysteine.

Activity-based protein profiling screening platforms

The use of competitive isoTOP-ABPP was expanded to identify proteome-wide targets of a small covalent fragment library by competing individual acrylamide and chloroacetamide fragments against an iodoacetamide–alkyne probe180. In this study, more than 700 ligandable cysteines were identified, and information was provided about the proteome-wide selectivity of each covalent fragment in the library. The covalent ligands discovered with this approach and their corresponding ligandable sites were used to help elucidate the role of caspase 8 and caspase 10 in extrinsic apoptosis in T cells, showing that this approach can rapidly identify compounds that target proteins of biological interest. Several studies have built on this work, using isoTOP-ABPP to find new covalent ligands. For example, cysteine reactivity and ligandable sites were mapped in mutant Kelch-like ECH-associated protein 1 (KEAP1) and compared with those in wild-type KEAP1 NSCLC lines. The authors discovered compounds that bind to a ligandable cysteine on the nuclear receptor NR0B1, which is regulated by NRF2, a substrate of KEAP1 (ref.181). Cysteine ligandability has also been explored in activated T cells through the use of promiscuous fragment-like compounds, termed ‘scout fragments’, to map ligandability and functional assays to identify more structurally complex electrophilic compounds that suppress T cell activity182. With exciting implications for drug discovery, this approach was also used to identify several proteins that could be targeted covalently to impair T cell activity, including BIRC2 and BIRC3, the nucleosome remodelling deacetylase (NuRD) complex, and the kinases ITK and CYTIP.

To dramatically increase sample throughput, a tandem mass tag (TMT)-based streamlined cysteine activity-based protein profiling (SLC-ABPP) methodology was designed and used to profile an electrophilic fragment library at an impressive depth of more than 8,000 reactive cysteine sites with a total instrument time of 18 min per compound183. As competitive isoTOP-ABPP becomes more high throughput, comprehensive selectivity and reactivity information will rapidly become available for diverse covalent reactive libraries within a wide context of biological disease states. This information will enable rapid identification of covalent ligands against reactive hotspots, along with providing selectivity information on each ligand. When paired with a parallel phenotypic screen against a desired outcome (such as cancer cell death), this methodology facilitates the identification of functional covalent ligands and their corresponding protein targets in a high-throughput manner.

Chemoproteomics platforms for target identification

IsoTOP-ABPP can also be used to identify the protein targets of known electrophilic drugs. As an example, this approach was applied to dimethyl fumarate, which is used to treat autoimmune disease. Although dimethyl fumarate has been used for three decades to treat psoriasis and was approved by the FDA in 2013 for the treatment of multiple sclerosis, the direct covalent targets of dimethyl fumarate remained unclear until more recently. In separate studies, chemoproteomic approaches were used to identify protein kinase Cθ (PKCθ) and IRAK4 as targets of dimethyl fumarate184,185. In both cases, covalent engagement of a cysteine residue disrupted a protein–protein interaction to modulate immune cell function. Disrupting the interaction of PKCθ with the costimulatory receptor CD28 reduced T cell activation, and disrupting the IRAK4–MYD88 interaction suppressed the production of interferon-α in plasmacytoid dendritic cells184,185.

ABPP can also be used to identify off-targets and generally assess the selectivity of covalent molecules. For example, SLC-ABPP was used to analyse spleen tissue extracted from C57BL/6 mice treated with ibrutinib183. Of ~9,200 cysteine sites identified, BTK Cys481 was one of the cysteines most liganded by ibrutinib. Cys313 on B lymphocyte kinase (BLK), which, analogous to BTK, contains a cysteine within the ATP-binding pocket, was also identified as an off-target of ibrutinib183.

Novel screening platforms are also often easily paired with isoTOP-ABPP target identification experiments. For example, a multiplexed in vivo screening platform was developed in which barcoded pancreatic ductal adenocarcinoma lines were pretreated with electrophilic compounds and injected into mice to observe the compound-dependent decrease in metastatic potential186. IsoTOP-ABPP experiments also enabled the identification of the lipase ABHD6 as the target of hit compounds from this screen, even though ABHD6 was not previously known to have a role in metastasis or cancer progression. Beyond identifying the lipase ABHD6 as crucial for metastatic fitness, this approach enabled screening in a biological context more relevant to the disease state through adaptation of covalent ligand screening to a multiplexed in vivo phenotypic assay. In general, target identification experiments using chemoproteomic platforms are crucial in investigating the mechanism of action of electrophilic compounds discovered through phenotypic assays.

Covalent ligand discovery for induced proximity modalities

Covalent ligands are not only useful as functional inhibitors, but also have important roles in emerging induced proximity modalities. Covalent drug discovery platforms have facilitated the expansion of targeted protein degradation approaches by enabling the discovery of covalent recruiters against E3 ubiquitin ligases14,187,188,189,190,191,192,193. Although most bifunctional degrader molecules (proteolysis-targeting chimeras (PROTACs)) recruit the E3 ligases cereblon (CRBN) or von Hippel–Lindau (VHL) protein to degrade target proteins, there are more than 600 E3 ligases with varying substrate scopes. Since 2019, covalent recruiters have been used to validate a large proportion of the E3 ligases that have been harnessed for targeted protein degradation, including RING finger protein 114 (RNF114), RNF4, DDB1 and CUL4-associated factor 16 (DCAF16), DCAF11, KEAP1 and, most recently, fem-1 homologue B (FEM1B)187,188,189,190,191,192,193. In 2019, isoTOP-ABPP was used to identify RNF114 as the target of the enone-containing natural product nimbolide, which was used to make bifunctional degraders of bromodomain-containing protein 4 (BRD4) and BCR–ABL187. In a separate study, scout fragments were used to construct bifunctional FKBP12 and BRD4 degraders, and the authors identified DCAF16 as the target E3 ligase responsible for degradation190. These discoveries led to the variety of covalent E3 recruiters now available, which have been reviewed elsewhere194. On the basis of analyses of chemoproteomic data sets assessing cysteine reactivity, 97% of E3 ligases possess reactive cysteines, suggesting that covalent approaches to harness more E3 ligases could continue to be successful194.

Beyond degradation, the identification of non-inhibitory covalent ligands has the potential to contribute to the discovery of novel induced proximity modalities. For example, a targeted protein stabilization platform, termed deubiquitinase-targeting chimeras (DUBTACs) has been developed using a covalent deubiquitinase recruiter195. Through analysis of chemoproteomic data and an ABPP-based screen, a covalent OTUB1 recruiter was discovered that could be incorporated into bifunctional compounds that stabilize mutant cystic fibrosis transmembrane conductance regulator (CFTR), the degradation of which drives cystic fibrosis. In general, the identification of non-inhibitory, allosteric ligands through covalent ligand screening has the potential to facilitate recruitment of other enzymes (such as kinases and deacetylases) to target proteins and direct protein function to neosubstrates for therapeutic benefit.

Nucleophilic covalent ligands

Most protein-reactive covalent drugs tend to be electrophilic, to enable reactions with nucleophilic amino acids. By contrast, nucleophilic drugs can react with electrophilic cofactors and post-translational modifications. Several hydrazine-containing compounds act as mechanism-based inhibitors of monoamine oxidase (MAO) A and B, whereby activation by MAO enables alkylation of a flavin cofactor, inhibiting the enzyme196. ABPP principles and ‘reverse-polarity’ probes have been employed to study electrophilic post-translational modifications, such as N-terminal pyruvoyl and glyoxylyl modifications, that can react with hydrazine-containing compounds197. Hydrazine-based probes have also been used to help identify and characterize ligands for proteins with electrophilic cofactors or post-translational modifications198,199.

Lysine-directed covalent ligands

The low abundance of cysteine enables selectivity but limits opportunities for covalently targeting specific proteins of interest. This problem has driven scientists to investigate the targeting of other nucleophilic amino acids, particularly lysine. Owing to the low nucleophilicity of the ε-amino group of lysine under physiological conditions, the discovery of efficient lysine-targeting covalent ligands requires identification of unusually reactive lysines. IsoTOP-ABPP experiments using several lysine-directed probes have proved powerful in profiling lysine reactivity across the proteome200,201. Through isoTOP-ABPP experiments, more than 9,000 ligandable lysines were identified and more elaborated pentafluorophenol-containing or N-hydroxysuccinimide-ester-containing compounds could selectively label specific proteins of interest200,201. Building on these experiments, a library of approximately 180 electrophiles was assembled and isoTOP-ABPP experiments then performed to assess the selectivity of different chemotypes and identify lysines ligandable with small molecules202. This study yielded more broadly reactive electrophiles, such as dicarboxaldehydes, that could be used for further lysine profiling experiments, but also identified less-reactive electrophiles, including N-acyl-N-alkyl sulfonamides, which had been previously used as tools for bioconjugation in cells202,203.

Aside from voxelotor, which was discovered through optimization from fragment-like aldehydes, most lysine-targeting ligands have been designed using structure-based methods from an existing ligand, often through rationally placing an electrophilic sulfonyl fluoride, fluorosulfate or vinyl sulfone in an appropriate orientation to react with an ε-amino group of a lysine adjacent to an established binding site. Such ligands include a kinetic transthyretin stabilizer204, an isoform-selective PI3Kδ inhibitor205 and inhibitors of cyclin-dependent kinase 2 (CDK2)206 and Hsp90 (ref.207). A sulfonyl fluoride-bearing promiscuous kinase inhibitor that targets a conserved lysine in the ATP-binding site was also used as a probe to profile kinase inhibitor selectivity in live cells208. However, sulfonyl fluoride-based probes are not completely selective for lysine and have also been used to target tyrosine residues209. A recent preprint reported the use of a chemoproteomic approach to profile the amino acid reactivity preference of 54 different electrophiles, which will prove to be a great resource for covalent ligand discovery210. Combining comprehensive electrophile profiling, lysine-directed chemoproteomics and structure-guided approaches will enable scientists to leverage the abundance of lysine residues adjacent to ligand binding sites to enhance covalent drug discovery.

Outlook

Over the past decade, advances in covalent drug discovery have led to successful drugs, including inhibitors of EGFR, BTK, KRAS(G12C) and SARS-CoV-2 Mpro. The approvals of these drugs represent milestones that showcase the evolution of covalent drug discovery from a serendipitous effort to a field with established roadmaps for success.

Adoption of electrophile-first discovery strategies represents a notable shift in the field. Ligand-first strategies will continue to be highly applicable for designing covalent drugs against proteins when existing reversible ligands are already known to bind near a nucleophilic amino acid such as cysteine. However, we anticipate that electrophile-first approaches will be increasingly employed, especially when the discovery of reversible ligands proves challenging. Electrophile-first approaches will be facilitated, in part, by chemoproteomics experiments that profile amino acid reactivity across the proteome, leading to the identification of novel ligandable cysteines, for example, that could be targeted with electrophilic compounds.

Additionally, we expect to see an increase in research that explores the use of reversible covalent mechanisms that strike a balance between potency and selectivity in various contexts. Reversible covalent compounds with long off-rates might achieve a desired therapeutic effect while minimizing covalent off-target effects, and the use of more varied electrophiles will allow for increasingly tailored reactivity. We also look forward to further improvements in chemoproteomic workflows that enable additional multiplexing in MS experiments, which will be valuable for assessing compound selectivity and target engagement.

Covalent drug discovery overcomes obstacles in designing ligands against otherwise undruggable protein targets. We expect that the unique features of covalent ligands will continue to spur the discovery of covalent drugs.