Introduction

Peptides and proteins are the most ubiquitous biomolecules in living systems and are responsible for orchestrating a plethora of functional and structural roles in the cell. The final structure and function of a given polypeptide are dictated by the specific sequence of 21 proteinogenic amino acids that is encoded within the genome of the organism. The flow of genetic information from DNA to peptides and proteins — the central dogma of molecular biology — involves the transcription of genetic information in DNA into mRNA, followed by the chemical synthesis of polypeptides at ribosomes based on the genetic information encoded in mRNA (translation). However, the total ensemble of proteins within a cell (the proteome) is far more complex than the genome of an organism alone would allow. For example, in humans, 25,000 genes are thought to lead to a proteome in excess of a million proteins. This enormous diversity arises from the chemical modification of proteins after ribosomal synthesis, leading to further diversification of an otherwise concise proteome. These so-called post-translational modifications (PTMs) encompass a wide range of chemical alterations to the protein structure, such as functionalization of amino acid side chains, and have been shown to modulate the structure and function of several proteins in profound ways15. It is widely accepted that nature would not expend energy modifying polypeptides unless the products fulfil a highly important biological role, yet the effect of a given modification on the structure, stability, function and activity of the majority of peptides and proteins across all taxa remains unknown.

The exquisite specificity and potency of peptides and proteins at their targets have led to a renaissance in their use as therapeutics over the past decade68. This is reflected in the United States Food and Drug Administration (FDA) approval rates, which are now more than double those of small molecules. Currently, there are more than 100 approved peptide drugs and over 200 protein therapeutics approved for clinical use9, accounting for more than 10% of the pharmaceutical market (ca. US$40 billion). This number is set to greatly increase in the coming years with hundreds of peptides and proteins currently in clinical trials or undergoing preclinical assessment. Peptide and protein drugs have typically been associated with two key drawbacks that can limit their therapeutic applicability: first, large molecular weight, which hampers bioavailability; and second, the presence of native peptide bonds, which are susceptible to proteolytic degradation, leading to short biological half-lives. In some cases, these shortcomings have been alleviated through the incorporation of native PTMs, for example, glycosylation, or through the installation of ‘bespoke modifications’ such as PEGylation10, lipidation or N-methylation.

While access to large polypeptides and proteins is typically achieved through biological expression in prokaryotic and eukaryotic systems, tailored modifications such as biotinylation, installation of d-amino acids11 or fluorescent tags remain challenging to access using the cellular machinery, despite several advances in unnatural amino acid incorporation12. As such, chemical synthesis has emerged as an attractive avenue to introduce specific modifications site-selectively and homogeneously on a protein of interest. This is in stark contrast to recombinant methods, in which the enzymatic nature of the PTM process results in inseparable heterogeneous mixtures of the target proteins.

Solid-phase peptide synthesis (SPPS) remains the most efficient platform for the chemical preparation of polypeptides up to 40–50 amino acid residues in length13. However, the linear nature of SPPS means that longer syntheses are usually plagued by peptide chain aggregation and steric crowding en bloc, which leads to truncated (uncoupled) sequences, unwanted side products and epimerization. The accumulation of these by-products over iterative steps results in low yields and purities of the final products. The size limit of polypeptide targets that can be generated by SPPS has inspired the development of chemical ligation methods to convergently assemble smaller peptide fragments to generate larger polypeptides and proteins. Early work in this area relied on the condensation of side-chain protected fragments. While this approach has been successfully exploited for the synthesis of peptide therapeutics, including on a production scale14, the poor solubility of crude protected fragments, as well as the susceptibility of the C-terminal amino acid residue to epimerize during activation, has made this approach largely unattractive. A key solution to these problems was provided through the development of peptide ligation technologies that facilitate the formation of native peptide bonds between completely unprotected peptide fragments1521. These technologies have underpinned the chemical synthesis of numerous peptide and protein targets that were previously inaccessible through recombinant methods or SPPS and have therefore played a key role in addressing a number of important questions in biology and medicine3,22,23. Of the ligation technologies developed to date, the convergent assembly of peptide fragments by native chemical ligation (NCL) represents the most widely employed method15. The power of this methodology is highlighted by its use in the synthesis of hundreds of protein targets to date. This review focuses on the development of a number of new ligation technologies that have been inspired by the concept of NCL, together with their utility in the total chemical synthesis of large polypeptides and proteins, with and without modifications.

Native chemical ligation

The principle of chemical ligation for peptides was initially explored by Kent and co-workers in the early 1990s (see Box 1 for the origin of the concept and intellectual framework leading to the development of NCL). The reaction first involves a reversible trans-thioesterification via nucleophilic attack on the thioester by the Cys thiolate moiety, leading to the formation of a thioester linkage between the two peptide segments (Fig. 1a). The thioester intermediate then undergoes a rapid S→N acyl shift to afford a native amide bond. One of the key features of the reaction is that it operates in purely aqueous media at neutral pH, conditions that aid in the solubility and stability of the unprotected peptide fragments and the protein targets. NCL technology has revolutionized the field of chemical protein synthesis and has been used for the construction of numerous protein targets since the seminal report of the method in 1994 (Ref. 24). Examples that highlight the utility of the method include the synthesis of a biologically active variant of the 166-residue erythropoiesis protein25, the 203-residue covalent dimer of HIV1 protease26 and, very recently, the 358-residue D-Dpo4 enzyme27. The NCL concept has also been successfully applied in a semisynthetic regime using expressed protein ligation as well as for the preparation of head-to-tail cyclic peptides16,28. There are several excellent reviews that highlight applications of the traditional NCL method and, as such, they will not be discussed in any further detail in this article2931. Instead, the remainder of this Review will highlight the development of new methods inspired by the NCL concept that have greatly expanded the scope of ligation chemistry to provide efficient access to a larger number of synthetic protein targets.

Figure 1: Ligation technologies inspired by the NCL concept.
figure 1

a | Native chemical ligation (NCL) followed by the desulfurization of Cys to Ala at the ligation junction. b | Proposed mechanism for the radical desulfurization of Cys to Ala. c | Toolbox of synthetic thiol-derived amino acids compatible with 9-fluorenyl-methoxycarbonyl (Fmoc)-solid-phase peptide synthesis (SPPS) and an exemplar application of β-thiol Leu and γ-thiol Val for the synthesis of human parathyroid hormone (hPTH, 1) using the NCL–desulfurization methodology49.

PowerPoint slide

Expanding the scope of NCL beyond cysteine. Despite the empowering nature of NCL in protein synthesis, the need for a Cys residue on the N terminus of one of the peptide fragments limits the possible retrosynthetic disconnections that can be considered when using the method, especially given the paucity of Cys in naturally occurring proteins (1.8% abundance). In recent years, several innovative strategies involving the use of N-terminal auxiliaries32,33 have been devised to enable protein disconnections at alternative ligation junctions and to abrogate reliance on N-terminal Cys residues (see Box 1 for details). Although auxiliary-mediated ligations have greatly increased the flexibility of ligation chemistry, such methods generally suffer from prolonged reaction times, whereby hydrolysis and epimerization become the dominant competing pathways and often require tedious auxiliary removal steps, leading to lower overall yields. Thus, auxiliary-based approaches have been used to access only a small number of protein targets to date.

In 2001, the scope of the NCL methodology was greatly expanded through the introduction of a post-ligation desulfurization concept. In the seminal report by Yan and Dawson34, the product of the NCL reaction was treated with either Raney Ni or Pd on Al2O3 to effect reductive cleavage of the sulfhydryl moiety in the Cys side chain, generating a native Ala residue (Fig. 1a). This method therefore permits the use of Cys as a surrogate for ligation sites containing Ala, a substantially more abundant amino acid (8.9% of residues) in proteins. Ultimately this powerful advance expanded the number of ligation-based disconnections for a given protein target and has been successfully implemented in the synthesis of a number of Cys-free polypeptides and proteins20,35. A limitation of the desulfurization protocol was the requirement for a large excess of nickel or palladium that in some cases led to undesirable side reactions, such as tryptophan hydrogenation or methionine demethylthiolization, affording α-amino butyric acid. A milder, metal-free desulfurization strategy was later developed by Danishefsky and co-workers36. This method relies on the use of a water-soluble radical initiator 2,2′-azobis[2-(2-imidazolin-2-yl)propane]dihydrochloride (VA-044) together with tris(2-carboxyethyl)phosphine hydrochloride (TCEP) and a hydrogen atom source (in this case, tBuSH) in aqueous media to effect desulfurization. This radical-promoted desulfurization approach, based on an early report by Hoffmann et al.37, is thought to be initiated by the formation of a thiyl radical at the Cys side chain, which adds reversibly to the phosphine (Fig. 1b). The resulting phosphoranyl radical can then undergo β-scission to produce an alanyl radical and a phosphine sulfide. Rapid hydrogen abstraction by the alanyl radical from an exogenous thiol then generates the native Ala residue. Importantly, these conditions are completely chemoselective in the presence of a range of potentially susceptible functionalities, including thioesters and methionine residues. Very recently, Li and co-workers have shown that rapid and clean desulfurization can be effected without the use of a radical initiator through treatment with sodium borohydride and TCEP38.

Since the seminal report by Yan and Dawson34, the post-ligation desulfurization concept has found enormous application in the total chemical synthesis of proteins via disconnection at Ala residues35. However, the methodology has also served as a catalyst for the concept that thiol-derived variants of other canonical amino acids could be used as Cys surrogates in NCL, followed by desulfurization to native residues. This has fuelled the development of synthetic routes towards suitably protected β-thiol, γ-thiol and δ-thiol derivatives of the proteinogenic amino acids that can be directly incorporated into fragments by SPPS and employed in protein synthesis using a ligation–desulfurization manifold. A decade on from the synthesis of the first thiol-derived amino acid, intensive research efforts by a number of research groups have culminated in a comprehensive toolbox of 9-fluorenylmethyloxycarbonyl (Fmoc)-SPPS-compatible thiolated variants of 13 of the proteinogenic amino acids, which have greatly expanded the repertoire of peptide ligation chemistry35,39,40 (Fig. 1c). The first contribution to the amino acid toolbox was β-thiol Phe reported independently by Crich41 and Botti42. The β-thiol Phe moiety was shown to successfully mediate ligation reactions with peptide thioesters in good yields when incorporated on the N terminus of peptides. Subsequent removal of the β-thiol auxiliary could be performed using nickel boride to generate the native Phe residue at the ligation junction; however, this desulfurization can also be performed using a radical initiator43. This proof-of-concept study, which demonstrated that ligation–desulfurization chemistry was possible at amino acids other than Cys, sparked interest in other thiol-derived amino acids by the community, some key examples of which are highlighted below.

Val and Leu represent some of the most abundant amino acids found in proteins (6.8% and 9.8%, respectively), and it was therefore not surprising that these were early targets for synthesis. Seitz and co-workers made use of a suitably protected variant of penicillamine (β,β-dimethylcysteine) as a valine surrogate44, while Danishefsky and co-workers developed a γ-thiol valine reagent that led to faster reaction kinetics when reacted with peptide thioesters compared with the homologous reactions at β,β-dimethylcysteine, owing to the improved accessibility of the thiol auxiliary45. However, ligation products from both Val surrogates could be cleanly desulfurized using the metal-free method (TCEP and VA-044) to afford native polypeptide products. Following this, syntheses of β-mercapto Leu derivatives were independently reported by the Brik46 and Danishefsky47 groups, the former report showcasing the methodology through the ligation-based assembly of the HIV-1 Tat protein. Another key reagent is the γ-thiol Lys derivative48, which owing to its ability to mediate ligation at both α-amino and ε-amino groups through six-membered S→N acyl shift transition states, has found numerous important applications in protein synthesis. More specifically, this dual ligation capability offers synthetic access to peptides and proteins bearing natural PTMs at the Lys side chain, including acetylation, ubiquitylation and methylation.

A powerful example of the benefits offered by the thiol-derived amino acid toolbox is highlighted by the consecutive use of β-thiol Leu, γ-thiol Val and Cys for the assembly of human parathyroid hormone (hPTH, 1), reported by Danishefsky and co-workers49 (Fig. 1c). More specifically, bifunctional fragment 2 bearing an N-terminal β-thiol Leu and a C-terminal alkyl thioester was initially ligated with thiophenyl thioester 3 to yield hPTH (1–37, 4). Separately, N-terminal Thz-protected fragment 5, functionalized as a thiophenyl thioester on the C terminus, was reacted with γ-thiol Val fragment 6. Subsequent Thz deprotection of the resulting ligation product afforded the Cys-bearing fragment hPTH (39–84, 7). A final NCL between hPTH (1–38, 4), possessing a C-terminal thioester, and hPTH (39–84, 7), with an N-terminal Cys residue, yielded the full-length hPTH sequence, which following global desulfurization, effected the removal of all three thiol auxiliaries to generate the native protein hPTH (1–84, 1) after a folding step.

A final noteworthy addition to the toolbox is β-thiolated Asp, which can be prepared in three steps from a commercially available Asp starting material [Boc-Asp(OtBu)–OH]50. This reagent has proved particularly useful owing to the development of an initiator-free chemoselective desulfurization reaction using TCEP and dithiothreitol (DTT) at pH 3, which enables removal of the β-thiol auxiliary in the presence of free sulfhydryl side chains of native Cys residues. This selective desulfurization technique therefore obviates the need for protecting group manipulation in protein targets containing functionally important Cys residues, as highlighted in the synthesis of the extracellular N-terminal domain of the chemokine receptor CXCR4 bearing two PTMs50. A further application of this chemoselective ligation–desulfurization methodology was recently reported by Becker and co-workers, who prepared a number of differentially PEGylated prion proteins by ligation at β-thiol Asp followed by chemoselective desulfurization in the presence of a native unprotected Cys residue51.

Directional flexibility for iterative assembly of proteins. Protein assembly via iterative ligation reactions in the N→C direction was originally achieved by harnessing the differing reactivity of thioesters in a kinetically controlled ligation. Kent and co-workers first demonstrated this concept by using the reactivity of a thiophenyl thioester on the C terminus of one fragment with a bifunctional fragment possessing Cys on the N terminus and a less reactive alkyl thioester on the C terminus that does not partake in the ligation reaction. This methodology was initially showcased in the six-segment assembly of the small protein crambin52. The kinetically controlled ligation concept has been further modified for one-pot protein synthesis by the use of a thiol additive — 2,2,2-trifluoroethanethiol (TFET) — which serves to increase the rate of ligation reactions through in situ generation of thioesters with increased reactivity53. Aryl thiol additives, such as mercaptophenylacetic acid (MPAA) (pKa = 6.6) and thiophenol (pKa = 6.6), are more commonly used to accelerate NCL reactions owing to their demonstrated proficiency in thioester exchange reactions with alkyl thioesters as well as their excellent leaving group ability upon reaction with the cysteinyl peptide fragment. Unfortunately, the radical quenching activity of these aryl thiol additives prohibits in situ radical desulfurization of the ligation products, and intermediate purification and lyophilization steps must therefore be carried out before a subsequent ligation reaction can be performed. Alternative methods to extract the aryl thiol species following ligation have been reported, including liquid–liquid extraction54 and solid-phase capture procedures55. However, TFET alleviates the need for these additional steps; the pKa (7.3) leads to highly competent thioester exchange and efficient acylation by the Cys thiol moiety. Furthermore, the volatility of TFET (boiling point = 35−37 °C) permits facile post-ligation removal through simple sparging with an inert gas; however, the alkyl thiol TFET is a poor radical quencher and therefore can remain in the reaction for in situ desulfurization of Cys or thiol amino acids at the ligation junction. It should be noted that commercially available TFET often requires distillation before use (depending on the source and purity) and, as a malodorous and volatile thiol, should be handled inside a fumehood.

The power of the one-pot kinetically controlled ligation–desulfurization strategy was showcased in the efficient assembly of four differentially sulfated variants of madanin-1, a 60-amino acid Cys-free thrombin inhibitor produced by the hard tick Haemaphysalis longicornis56 (Fig. 2a). The family of proteins (8–11) was assembled through the use of three suitably reactive peptide fragments, with the middle bifunctional fragments (12–15) possessing all possible sulfated variants at two Tyr sulfation sites (Tyr32 and Tyr35). The preformed TFET thioester 16 was initially ligated to bifunctional (sulfo)peptide fragments 12, 13, 14 or 15 bearing an N-terminal β-thiol Asp residue and a C-terminal alkyl thioester. The ligation proceeded regioselectively at the TFET thioester owing to increased reactivity compared with the alkyl thioester on 12–15. Upon complete reaction, the C-terminal Thr alkyl thioester was activated with 2 vol.% TFET and subjected to a second ligation with cysteinyl peptide 17. The resulting full-length product was finally subjected to in situ global desulfurization using VA-044, TCEP and reduced glutathione to convert Cys and β-SH Asp residues into Ala and Asp, respectively, affording native madanin-1 8 and madanin-1 sulfoproteins 9–11 in excellent yields over the multistep sequence. These synthetic proteins enabled the importance of Tyr sulfation for anticoagulant and thrombin inhibitory activity to be determined. Specifically, Tyr sulfation was shown to provide a 2–3 orders of magnitude improvement in thrombin inhibitory activity over the unmodified madanin-1 homologue56.

Figure 2: Chemical protein synthesis via iterative ligations in the N→C direction.
figure 2

a | One-pot synthesis of a library of sulfated madanin-1 proteins 8–11 via kinetically controlled ligation56. The ligation proceeds regioselectively at the 2,2,2-trifluoroethanethiol (TFET) thioester owing to increased reactivity compared with the other alkyl thioester; this less reactive thioester can be subsequently converted into the TFET thioester to facilitate a second ligation. b | Synthesis of SUMO-1 peptide conjugate 18 using bis(2-sulfanylethyl)amido (SEA) chemistry69. The N-acyl perhydro-1,2,5-dithiazepine moiety (SEAoff) remains inactive during the first ligation and can be converted into the thioester (through a N→S acyl shift) using a reductant and an exogenous thiol additive to execute the second ligation. c | Synthesis of α-synuclein 22 using acyl hydrazides as thioester surrogates68. The hydrazide moiety remains inactive under native chemical ligation (NCL) conditions and can be converted into a thioester (through activation of the hydrazide with NaNO2 followed by thiolysis of the resulting acyl azide with an external thiol additive) for iterative ligation reactions in the N→C direction. In structures 18 and 21, AEGR = Ala-Glu-Gly-Arg and ISAR = Ile-Ser-Ala-Arg. MPAA, mercaptophenylacetic acid; TCEP, tris(2-carboxyethyl)phosphine hydrochloride.

PowerPoint slide

Several other strategies have also been developed to enable iterative ligation reactions in the N→C direction. The most useful of these fall broadly into the category of thioester precursors and include C-terminal Cys activation57, the bis(2-sulfanylethyl)amido (SEA) auxiliary58, N-sulfanylethylanilide auxiliary59, N-alkyl Cys60,61, 3,4-diaminobenzoic acid (Dbz)62 and o-amino(methyl)aniline (MeDbz)63 linkers, o-aminoanilides64 and peptide acyl hydrazides6568. The SEA auxiliary, first reported by Melnyk and co-workers, possesses a 1,7-dithiol structure, which allows rapid interconversion between the inactive N-acyl perhydro-1,2,5-dithiazepine moiety (SEAoff) and the N→S acyl shift-active SEA dithiol form (SEAon) through simple redox manipulations. In the reduced form, the SEA auxiliary is competent in ligation chemistry through conversion into a thioester either by exchange with an exogenous additive, such as 3-mercaptopropionic acid, or through trapping of the N→S acyl-shifted SEA thioester with glyoxylic acid. Importantly, the SEAoff cyclic disulfide is compatible with mild reducing agents (for example, MPAA) commonly employed as thiol catalysts in NCL reactions, allowing the use of NCL and SEA ligations in concert. The orthogonality of the SEA auxiliary with NCL was recently highlighted in the synthesis of functional SUMO-1 peptide conjugate 18 (Ref. 69) (Fig. 2b). Initially, SUMO fragment thioester 19 was prepared via activation of the peptide bearing a C-terminal SEA auxiliary (not shown) through exchange with 3-mercaptopropionic acid. The resulting thioester 19 was then subjected to MPAA-catalysed NCL with fragment 20 bearing an N-terminal Cys and a latent C-terminal SEAoff moiety. Subsequent addition of TCEP facilitated the switching of SEAoff→SEAon and the SEA ligation could then be conducted with fragment 21 in a one-pot manner to afford 18.

Peptide acyl hydrazides have also proved to be highly useful thioester surrogates for the N→C assembly of protein targets through ligation chemistry65. Conversion of a given peptide with a C-terminal acyl hydrazide functionality into a thioester is performed through an operationally simple activation of the hydrazide with NaNO2 followed by thiolysis of the resulting acyl azide with an external thiol additive. Crucially, the hydrazide moiety remains inactive under NCL conditions, therefore acting as a masked thioester that can be unleashed for iterative ligation reactions in the N→C direction. An elegant example of the acyl hydrazide-based NCL approach is the preparation68 of α-synuclein 22, a protein that has been implicated in the formation of neuronal Lewy bodies and in the progression of several neurodegenerative disorders, including Parkinson’s disease. Liu et al.68 devised a four-segment N→C sequential ligation strategy starting with the activation of acyl hydrazide fragment 23 with NaNO2, followed by thiolysis (Fig. 2c). The resulting peptide thioester could then be ligated to fragment 24 in an MPAA-catalysed NCL reaction. This procedure was then repeated with fragments 25 and 26 to afford the full-length protein. Global radical desulfurization to effect Cys into Ala conversions at each of the three ligation junctions then afforded synthetic α-synuclein 22 in excellent overall yield.

In a manner similar to N→C protein assembly, several effective methods have also been developed for assembling proteins in the C→N direction. The crux of this concept is to precisely control sequential ligation steps through the use of orthogonal protecting groups for N-terminal Cys residues or Cys surrogates. The design and utility of appropriate Cys protecting groups remains a contemporary research focus70; however, several viable strategies have been reported in successful protein syntheses, including Thz71,72 derivatives and acetamidomethyl (Acm)73 protection of Cys. An elegant method from Brik and co-workers employed a Thz-protected δ-thiol Lys residue to facilitate three iterative ligations in the C→N direction at the ε-amino moiety of Lys with ubiquitin chains functionalized as C-terminal thioesters to generate tetraubiquitin 27 (Ref. 74) (Fig. 3a). Protein assembly was accomplished using three ubiquitin fragments, 28 containing an N-terminal δ-thiol Lys, 29 bearing an N-terminal Thz-protected δ-thiol Lys and a C-terminal thioester and peptide thioester 30. Using iterative cycles of benzylmercaptan-catalysed and thiophenol-catalysed ligation reactions and acidic methoxyamine-mediated Thz deprotection steps, four ubiquitin units were assembled to afford the 304-amino acid protein tetramer. Global radical desulfurization of the three δ-thiol Lys residues to native lysines then provided tetraubiquitin 27. Notably, Liu and co-workers have very recently reported the synthesis of hexaubiquitin through iterative acyl hydrazide chemistry, which represents one of the largest proteins to ever be prepared by chemical synthesis75.

Figure 3: Chemical protein synthesis via iterative ligations in C→N direction.
figure 3

a | Synthesis of the 304-residue tetraubiquitin (27)74. A Thz-protected δ-thiol Lys residue was employed to facilitate three iterative ligations at the ε-amino moiety of Lys with ubiquitin (Ubi) chains functionalized as C-terminal thioesters to generate tetraubiquitin. b | Synthesis of a homogeneously glycosylated variant of human interferon-β (IFNβ, 31)76. Three native Cys residues (that were inappropriately placed for use in native chemical ligation (NCL)) were protected with Acm groups during ligation–desulfurization reactions.

PowerPoint slide

Kajihara and co-workers also demonstrated the power of an iterative C→N ligation strategy for the total synthesis of a homogeneously glycosylated variant of interferon-β (IFNβ, 31)76. The approach involved disconnection of the 166-amino acid target into three fragments, whereby two Cys residues were introduced for NCL reactions, while the three native Cys residues were protected with Acm groups throughout the protein assembly (Fig. 3b). The synthesis began with NCL between N-terminal cysteinyl fragment 32 and the thioester of glycosylated fragment 33, which possessed a Thz residue on the N terminus. Subsequent methoxyamine-mediated Thz deprotection unmasked the N-terminal Cys residue to afford 34, which could then participate in ligation with N-terminal fragment 35. With the full-length protein assembled, desulfurization of the non-native Cys residues was effected under metal-free conditions. Silver acetate-promoted removal of the Acm protection on Cys and saponification of the benzyl ester protection on sialic acid followed by folding furnished homogeneously glycosylated IFNβ (31).

The ability to perform ligation–desulfurization reactions in an iterative manner in both the N→C and C→N directions has greatly improved the efficiency of chemical protein synthesis. With these technologies, the community has redefined the targets that can be produced, with substantially larger targets (>120 amino acids) now becoming more routinely accessible.

Extending NCL to selenocysteine

In parallel with the revolutionary advances of the NCL reaction manifold, there has also been considerable research attention focused on tackling some of the inherent limitations of the technology, specifically the lack of chemoselectivity of the desulfurization reaction in the presence of native Cys residues and the prohibitively slow ligation rates at sterically demanding amino acid junctions. In 2001, three independent groups demonstrated that the 21st amino acid (Sec) was competent in NCL-like transformations with peptide thioesters, thus providing access to large selenopeptides and selenoproteins for the first time7779. Sec was first acknowledged to be biologically vital based on the selenoenzyme glutathione peroxidase, which displayed selenium-based catalytic activity80. Since then, several selenoproteins have been identified with functions ranging from phospholipid biosynthesis, muscle development and calcium mobilization, to modulators of redox-regulated signalling8189. Despite being the chalcogenic analogue of Cys, Sec exhibits some strikingly different physicochemical properties. First, Sec exhibits a considerably lower reduction potential (−381 mV) than Cys (−180 mV)90,91. As a result, Sec readily undergoes air oxidation and exists exclusively as a dimeric species (diselenide)92. A reducing agent is therefore required for NCL reactions to proceed efficiently through the generation of the monomeric selenolate78. Second, the pKa of Sec (5.2–5.6) is lower than that of Cys (8.2), meaning that when monomeric, it exists predominantly as selenolate at physiological pH, thus enabling NCL at Sec to be performed at a lower pH and offering higher yields by minimizing thioester hydrolysis.

Since its inception, Sec-mediated NCL has been exploited for the synthesis of a wide range of peptides67,79,9395 and proteins77,78,96100. Some of the early examples include a 17-mer fragment of ribonucleotide reductase79 and a bovine pancreatic trypsin inhibitor (BPTI) analogue78, both possessing an intramolecular selenosulfide linkage. In another example, Hilvert and co-workers synthesized a cyclic peptide by macrolactamization of a linear precursor functionalized with a C-terminal thioester and an N-terminal Sec residue by NCL94. In addition, the internal Sec in the ligated cyclic product was shown to be amenable to various synthetic transformations, such as alkylation, oxidative elimination and reductive deselenization94. Sec was also found to be compatible with expressed protein ligation (EPL), demonstrated through the synthesis of RNase A77 and azurin101. In both cases, a synthetic peptide bearing an N-terminal selenocystine, instead of native Cys, was ligated with a large protein thioester derived from recombinant techniques. Very recently, Rozovsky and co-workers have developed a method for the incorporation of Sec into expressed protein fragments by enriching the growth medium for Escherichia coli with Sec, such that it could be subsequently incorporated using the Cys codon102. Importantly, the ENLYFQ motif was N-terminally fused to Sec, and the resulting Q–Sec junction in the mature construct could be cleaved using tobacco etch virus (TEV) protease to afford proteins with an N-terminal Sec residue. This enabled the development of expressed protein ligation at Sec with large expressed fragments (Sec-EPL). This work provides the impetus for the site-specific modification of Sec residues within proteins in the future; however, the method currently cannot accommodate Cys residues in the Sec-containing fragment owing to non-selective incorporation.

In 2010, a landmark contribution to Sec-based ligation chemistry came from Dawson and co-workers, who discovered that deselenization of a Sec residue could be achieved under mild conditions using the reducing agent TCEP and a hydrogen donor, such as DTT (Fig. 4a). Importantly this method does not require a radical initiator and was completely chemoselective in the presence of unprotected Cys residues95. This pivotal finding highlighted the enormous potential of Sec-based NCL as a method for the construction of proteins retaining native Cys residues that may be crucial to the structure and/or function of a given protein target. It is important to note that before the development of this methodology, the synthesis of Cys-containing targets under a NCL–desulfurization regime necessitated the use of protecting groups on the side chains of Cys residues during the desulfurization step. The observed selectivity is proposed to arise from the weaker C–Se bond, favouring formation of the alanyl radical at Sec over Cys. Mechanistically, the deselenization is proposed to proceed through reversible addition of a selenium-centred radical to the phosphine, leading to a phosphorus-centred radical species. The highly thermodynamically favourable production of a phosphine selenide is then proposed to drive the homolysis of the C–Se bond. The resulting β-carbon-centred radical is then capable of hydrogen atom abstraction to generate the native alanine residue (Fig. 4b). Initially, the ligation–chemoselective deselenization approach was applied on small peptidic systems, including a 38-residue fragment of the redox enzyme glutaredoxin 3 (Grx3, 1–38)95. However, the power of this methodology was further exemplified by Metanis et al. in the synthesis of the 125-amino acid human enzyme phosphohistidine phosphatase (PHPT1, 40)97. With three Cys residues located in the C-terminal region of the sequence, a strategy that employed both traditional NCL and Sec-mediated ligation reactions was devised, with three segments undergoing sequential ligations in the C→N direction (Fig. 4c). Bifunctional segment 37 was prepared bearing an N-terminal Sec residue protected as a Sez and functionalized with a C-terminal thioester for a standard NCL reaction with the N-terminal Cys residue of fragment 38. The ligated product was treated with MeONH2 to effect conversion of the Sez moiety into Sec and subjected to Sec-mediated NCL with C-terminal thioester segment 39. Interestingly, while Sez was demonstrated to be stable under Fmoc-SPPS and NCL conditions (similar to Thz), the authors reported that MeONH2-promoted ring opening was faster in the case of Thz (based on a model system). The purified ligation product could then be successfully deselenized to afford PHPT1 (40). Importantly, the deselenization of Sec proceeded smoothly without modifying the three unprotected Cys residues present in the sequence, thus highlighting the selectivity of the protocol. The same group later accomplished the total synthesis of the 122-residue human selenoprotein M (SELM) through the iterative Sec-mediated ligation–deselenization assembly of four fragments in the C→N direction98. Notably, SELM comprises a CXXU motif that is crucial for its biological activity; this would otherwise be inaccessible with traditional Cys-based ligation methods.

Figure 4: Applications of peptide ligation chemistry at the 21st amino acid (Sec)
figure 4

a | Native chemical ligation (NCL) at Sec followed by chemoselective deselenization in the presence of unprotected Cys. b | Proposed mechanism for the chemoselective deselenization of Sec. c | Synthesis of PHPT1 (40) via a key chemoselective deselenization step97. d | Synthesis of eglin C (41) via ligation–oxidative deselenization96. MPAA, mercaptophenylacetic acid.

PowerPoint slide

A further exploitation of the Sec reactivity was demonstrated by the Payne96 and Metanis103 groups, who independently discovered that treatment of Sec with TCEP in the presence of an exogenous oxidant leads to clean conversion into serine at the ligation junction. The discovery of this oxidative deselenization transformation has further broadened Sec ligation chemistry beyond Ala disconnections96,103,104. While the Metanis group performed deselenization in the presence of oxygen103, Payne and co-workers employed oxone as the oxidant96,104. Notably, the latter approach has been successfully employed for the synthesis of MUC4 and MUC5AC-based glycopeptides96,104. Furthermore, the methodology was used to assemble the Cys-free protein eglin C (41) via a single ligation between C-terminal thioester 42 and selenocystine-bearing fragment 43, followed by oxidative deselenization96 (Fig. 4d).

In a manner similar to the improvement in the scope of NCL through the development of thiol-derived amino acids, expansion of the Sec ligation was first attempted by Danishefsky and co-workers, who developed the synthesis of a trans-γ-selenoproline building block in three steps from orthogonally protected hydroxyproline93. This amino acid was subsequently incorporated into model peptides using Fmoc-based SPPS and was successfully used in ligation–deselenization chemistry with various peptide thioesters. Malins et al. also developed an efficient synthesis of a suitably protected β-selenophenylalanine from Garner’s aldehyde, which could also be successfully employed in ligation chemistry followed by chemoselective deselenization in the presence of unprotected Cys67. Taken together, the Sec-based ligation methods, coupled with chemoselective deselenization chemistry, represent powerful new approaches for accessing protein targets without strategically placed Cys residues or where chemoselective removal of the ligation auxiliary in the presence of other sensitive residues (for example, structurally or functionally important Cys residues) is necessary.

Selenoester acyl donors for acceleration of ligation-based protein assembly. The rate of NCL is known to be strongly influenced by the steric and electronic environment of the C-terminal amino acid residue of the thioester component. For instance, peptide thioesters bearing sterically hindered β-branched amino acids at the C terminus (for example, Ile, Thr and Val) suffer from sluggish reaction rates, affording lower ligation yields owing to competing thioester hydrolysis. In the case of C-terminal proline thioesters, an n→π* electronic donation into the carbonyl carbon leads to reduced electrophilicity of the prolyl thioesters, making Pro-Cys junctions synthetically intractable105. A solution to this problem was reported by Durek and Alewood, who rationalized that replacement of the thioester moiety by an alkyl selenoester would lead to productive ligation chemistry, owing to the superior leaving group ability of the selenolate over the thiolate. Indeed, ligation at model peptides bearing a C-terminal prolyl selenoester were complete in 2 hours, nearly 350 times faster than traditional NCL106 (Fig. 5a). This initial study laid the foundation for the use of selenoesters as acyl donors in the ligation-based assembly of proteins, including several reports describing efficient methods for accessing peptide selenoesters of various lengths both in solution107 and on the solid phase108.

Figure 5: Peptide ligation chemistry using selenoesters as the acyl donor.
figure 5

a | Native chemical ligation (NCL) using proline selenoesters. b | Additive-free diselenide–selenoester ligation (DSL)–deselenization technology. c | Application of additive-free DSL together with traditional NCL for the one-pot synthesis of the Mycobacterium tuberculosis protein ESAT-6 (48)107. The synthesis of the target was accomplished in high yield via a one-pot kinetically controlled ligation of three fragments in the N→C direction, exploiting the inherent difference in reactivity of thioesters and selenoesters. Key steps involved chemoselective DSL, followed by 2,2,2-trifluoroethanethiol (TFET)-mediated NCL with concomitant deselenization and subsequent desulfurization.

PowerPoint slide

More recently, Mitchell et al. postulated that substantial rate accelerations in peptide ligation chemistry could be achieved by harnessing the superior reactivity of C-terminal selenoesters (in this case, aryl selenoesters) in combination with the improved nucleophilicity of Sec at the N termini of the other reacting peptide fragments. To test this hypothesis, two model peptides were chosen for the initial experiments, one functionalized as a C-terminal Ala phenyl selenoester and the other was a diselenide dimer possessing an N-terminal selenocystine residue. The authors initially explored the possibility of implementing electrochemistry for reduction of the diselenide to the ligation-active selenolate in order to circumvent the concomitant deselenization of Sec that occurs in the presence of phosphine reductants. However, in a serendipitous finding, the control experiment (without application of a current) led to the generation of the desired ligation product107. Strikingly, the ligation proceeded cleanly by simply dissolving the peptide fragments in denaturing buffer without the addition of any additives. The additive-free reaction, which was subsequently dubbed diselenide–selenoester ligation (DSL), was also complete within 60 seconds at room temperature, representing a large rate acceleration over the analogous reaction under an NCL manifold. This rate acceleration was also maintained at sterically hindered selenoesters, which were complete within 10 minutes, comparing favourably with the analogous NCL reactions, which require up to 48 hours. Because selenoesters are considerably more prone to hydrolysis than thioesters (at pH > 7), careful pH adjustment during ligation is crucial. Fortunately, the ability to perform these ligations at acidic pH (5–7) enables competing selenoester hydrolysis to be circumvented. It is important to note that the final ligation product is typically obtained as a mixture of symmetrical diselenide, asymmetrical diselenide and product bearing a selenoester linkage on the Sec used for ligation. However, all of these products coalesce into a single product following in situ deselenization (via treatment with TCEP and DTT) (Fig. 5b). Intrigued by this unique transformation, a series of experimental and computational investigations were undertaken to gain mechanistic insight into this rapid ligation methodology. Given the absence of any reductants or additives, it has been proposed that there is a unique initiation step for the DSL transformation to provide a competent intermediate that can enter a native chemical ligation-like pathway. In addition, based on the data compiled from theoretical and experimental observations, precipitation of diphenyl diselenide (DPDS) — a by-product generated during ligation in aqueous buffer — was proposed to be a potential driving force for the reaction. It is worth mentioning that the DPDS produced during ligation acts as a radical quencher and thus needs to be removed through hexane extraction before performing an in situ deselenization reaction. Recently, ligation has also been used in conjunction with oxidative deselenization technology to afford serine at the ligation junction, as showcased in the synthesis of fragments of some human mucin glycoproteins104.

To illustrate the synthetic utility, the additive-free DSL–deselenization methodology was also applied in the construction of two proteins107. First, intracellular chorismate mutase from Mycobacterium tuberculosis was assembled through a one-pot ligation–deselenization approach with 57% yield over two steps. After folding, the full-length enzyme was found to possess structure and catalytic activity similar to those of the wild-type enzyme. The orthogonality of the DSL chemistry with NCL was also exemplified through the synthesis of another M. tuberculosis protein — early secretory antigenic protein 6 (ESAT-6). The synthesis of the target was accomplished in high yield via a one-pot kinetically controlled ligation of three fragments in the N→C direction (Fig. 5c). Specifically, the inherent difference in the reactivity of thioesters and selenoesters was exploited through chemoselective DSL between bifunctional middle diselenide dimer segment 44, with an N-terminal selenocystine and a C-terminal thioester, and peptide phenylselenoester 45. The ligated product 46 was generated exclusively in minutes and could be subsequently subjected to NCL with C-terminal segment 47 bearing an N-terminal Cys using TFET as an additive. The presence of TCEP in the NCL step also led to the concomitant deselenization of Sec40 used for the initial DSL reaction. Upon desulfurization using VA-044, TCEP and glutathione, ESAT-6 48 was obtained in 43% yield over four steps following a single high-pressure liquid chromatography (HPLC) purification. The speed and efficiency of the additive-free DSL technology makes it a valuable addition to the ligation chemistry toolbox for the chemical synthesis of proteins. With salient features such as operational simplicity, unprecedented reaction rates (even at sterically encumbered junctions), broad pH tolerance (pH 3–7) and compatibility with unprotected Cys residues and NCL, it is likely that this methodology will find wide application in the chemical synthesis or semi-synthesis of numerous other important protein targets with or without PTMs and other modifications in the future.

Rapid protein assembly via DSL chemistry at selenol-derived amino acids. Since the first report of DSL in 2015, a number of suitably protected selenylated amino acids, including β-selenoLeu109, β-selenoAsp110 and γ-selenoGlu110, have been developed with a view to broaden the scope of the methodology (Fig. 6). Each of these building blocks is compatible with Fmoc-SPPS and have been successfully employed in DSL–deselenization transformations, including protein synthesis, as highlighted below.

Figure 6: Protein synthesis via diselenide–selenoester ligation (DSL)–deselenization at diselenide-derived amino acids.
figure 6

a | Synthesis of a library of differentially sulfated chemokine-binding UL22A 54–57 proteins109. b | β-selenoAsp, γ-selenoGlu and the one-pot synthesis of the tick-derived thrombin inhibitor hyalomin-2 (58)110. Ligation–deselenization, purification, quantification and bioassay could be achieved within 3 hours, owing to the rapid kinetics of deselenization of β-selenoAsp. SPPS, solid-phase peptide synthesis.

PowerPoint slide

The utility of DSL–deselenization chemistry at β-selenoLeu has been powerfully demonstrated in the synthesis of a library of differentially modified variants of the CCL5 (also known as RANTES) chemokine-binding protein UL22A from human cytomegalovirus, which were predicted to be sulfated at Tyr65 and Tyr69 (Ref. 109). As there are no Cys or suitably placed alanine residues in the native sequence of UL22A, the protein could not be assembled through traditional ligation methods. Wang and co-workers therefore chose to disconnect UL22A at a challenging Val-Leu junction, leading to two target fragments, diselenide 49 bearing an N-terminal β-selenoLeu and N-terminal fragments 50–53 with variation in the sulfation state at Tyr65 and Tyr69 (Fig. 6a). The synthesis of 49 was achieved by Fmoc-SPPS, incorporating the suitably protected β-selenoLeu, which was in turn accessed from Garner’s aldehyde in eight steps109. Additive-free DSL reactions between 49 and 50–53 were initially unsuccessful, presumably owing to the sterically demanding nature of the Val-Leu junction. Nonetheless, all ligations with (sulfo)peptide phenylselenoesters 50–53 proceeded smoothly in the presence of TCEP and DPDS as additives, reaching completion in just 1 hour. Following in situ deselenization and HPLC purification, the desired (sulfo)proteins were obtained in excellent yields. The doubly sulfated UL22A was shown to possess a 2.5 orders of magnitude improvement in binding to RANTES over the unmodified protein, thus validating the importance of sulfation for biological activity109.

Although DSL reactions generally reach completion in minutes, the chemistry is followed by a comparably sluggish deselenization step, normally requiring 6–16 hours. As such, improving the rate of deselenization would provide access to polypeptides and proteins on extraordinarily short timescales. Such an innovation was recently described by Mitchell et al., who demonstrated that the presence of a weak C–Se bond in β-selenoAsp and γ-selenoGlu enables rapid and clean deselenization in less than a minute, orders of magnitude faster than deselenization at Sec110. The exceptional rate increase is thought to be a result of stabilization of the carbon-centred radical, generated during deselenization, by the neighbouring carboxylate functionality in the Asp and Glu derivatives. These selenoamino acids were synthesized in three steps starting from commercially available Boc-Asp(OtBu)–OH and Boc-Glu(OtBu)–OH, respectively, with the key selenol functionality installed through an electrophilic selenylation reaction. These could be readily incorporated into model peptides using the Fmoc-SPPS strategy and were demonstrated to undergo ligation followed by rapid deselenization, furnishing desired ligated products in excellent yields. Based on these promising results, it was reasoned that this rapid deselenization combined with the expedient ligation reaction could provide a means to accelerate chemical protein synthesis. To explore this possibility, a library of tick-derived thrombin inhibitors (hyalomin-2, hyalomin-3 and hyalomin-4)111 was prepared using one-pot ligation–deselenization technology at β-selenoAsp. As exemplified for hyalomin-2 (58) in Fig. 6b, the hyalomins were assembled from two fragments, one functionalized as a C-terminal phenylselenoester (59) and the other as a peptide dimer bearing an N-terminal β-selenoAsp moiety (60)110. A single one-pot DSL–deselenization transformation facilitated the production of each of the hyalomin proteins within minutes. The entire synthetic procedure, HPLC purification, solvent evaporation (via centrifugal concentration), quantification and thrombin inhibition bioassay could be performed within just 3 hours, opening the exciting possibility of generating rapid SAR on small proteins using this technology in the future.

Due to the rapid kinetics of deselenization of β-selenoAsp and γ-selenoGlu, it was hypothesized that this step could be performed chemoselectively in the presence of an unprotected Sec residue, enabling access to native selenoproteins. Accordingly, selenoprotein K (SelK) was selected as a synthetic target to validate this concept. SelK contains a Sec residue close to the C terminus (Sec92) that is responsible for the formation of an intermolecular diselenide in the native homodimeric protein112114. Biologically, SelK is an endoplasmic reticulum membrane protein, believed to be involved in regulating cellular redox balance in cardiomyocytes88 and stimulating Ca2+ flux to control immunity86. The protein was disconnected between Tyr60 and Asp61, necessitating the synthesis of 59-residue peptide phenyl selenoester 61, as well as 62, which possesses an intramolecular diselenide between the native Sec and the N-terminal β-selenoAsp moiety (Fig. 7). Both fragments were made by standard Fmoc-SPPS methods. For 62, the Sec and β-selenoAsp residues were introduced in suitably protected form, and upon cleavage, deprotection and purification afforded exclusively the intramolecular diselenide product. The purified segments were next reacted under additive-free DSL conditions, providing the ligated product as the intramolecular diselenide in 62% yield. Treatment of this intermediate with TCEP (in the absence of an external hydrogen atom source) for just 2 minutes effected the chemoselective deselenization of β-selenoAsp over Sec to afford SelK. In an attempt to streamline the process, a one-pot ligation–chemoselective deselenization strategy was also employed and provided direct access to SelK (63) in 40% yield110.

Figure 7: One-pot synthesis of 21 kDa homodimeric SelK.
figure 7

Selenoprotein K (SelK, (63) was prepared by additive-free ligation followed by chemoselective deselenization of β-selenoAsp in the presence of Sec110. In structures 62 and 63, GR = Gly-Arg. DSL, diselenide–selenoester ligation.

PowerPoint slide

Based on early applications of DSL technology, the speed and unrivalled chemoselectivity is expected to find widespread applications in the rapid access to therapeutic peptides and protein libraries in the near future, including those possessing the 21st amino acid (Sec). Moreover, access to selenylated derivatives of other proteinogenic amino acids will further expand the scope of this methodology.

Summary and outlook

With the recent advances in peptide ligation technology, it is clear that chemical synthesis can now be used to produce large polypeptide and small protein therapeutics in a highly robust and efficient manner. As such, it is possible that these methods can provide a viable alternative to traditionally used recombinant expression technologies while providing the additional benefit and flexibility of incorporating PTMs or bespoke modifications in a site-specific manner to attenuate structure, function and stability. Inspired by the transformational NCL concept, recently developed methodologies have overcome many of the limitations of the seminal approach and have expanded the number of accessible protein targets as well as the efficiency of chemical protein synthesis. For example, with access to a range of synthetic thiolated amino acids, the repertoire of NCL has been greatly broadened and provides an enormous amount of retrosynthetic flexibility for accessing a protein target by total chemical synthesis. The potential to purchase these reagents from commercial vendors in the future should allow further uptake of these technologies by the community. Furthermore, the rapid kinetics of the recently reported DSL technology provides a viable avenue to overcome one of the remaining key challenges of standard peptide ligation chemistry, the sluggish rates of reaction at sterically hindered junctions. The development of multi-component protein assembly in the N→C or C→N direction, coupled with the orthogonality of NCL-based and DSL-based methods, raises the exciting possibility of generating proteins with minimal handling and intermediary purification steps and on unprecedented timescales. While the median size of a human protein is 375 residues, most proteins that have been generated by chemical synthesis to date are half this size. However, the development of orthogonal N-terminal protection strategies and masked C-terminal acyl donors, coupled with NCL and DSL chemistry, now provides a means to target proteins of increasing size and complexity. Indeed, the groups of Kay115, Liu27,116 and Klussmann117 have recently reported the preparation of larger targets by total chemical synthesis. An alternative means of generating larger proteins bearing homogeneous modifications is through EPL techniques118. This methodology is well established for NCL, and recently developed recombinant methods for Sec incorporation open the possibility of EPL under a DSL manifold.

The plethora of enabling methods available for chemical protein synthesis also has the potential to open up new fields of research. For example, the speed and efficiency of the latest ligation techniques offer the exciting possibility of generating protein libraries, thus enabling synthetic protein medicinal chemistry. While such a platform cannot compete with the large number of targets that can be generated through phage display119 or expanded codon methodologies120,121, it has the potential to fuel peptide and protein drug discovery efforts by enabling focused library generation and the establishment of SARs in a manner similar to that of medicinal chemistry programmes with small molecules, where modified, unnatural and/or d-amino acids can be installed site-selectively. The bottleneck of chemical protein synthesis is no longer ligation-based assembly but rather the time-consuming synthesis of the suitably functionalized peptide segments by SPPS, along with laborious HPLC purification and freeze-drying steps. While it is still very difficult to predict the efficiency of the synthesis of a given peptide target by SPPS, a number of recent modifications to the standard method, namely, microwave heating and flow chemistry122, have the potential to accelerate this time-consuming process. Automating the SPPS process with purification would reveal the tantalizing possibility of performing semi-automated protein synthesis in the future.

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

How to cite this article

Kulkarni, S. S., Sayers, J., Premdjee, B. & Payne, R. J. Rapid and efficient protein synthesis through expansion of the native chemical ligation concept. Nat. Rev. Chem. 2, 0122 (2018).