Introduction

Covalent attachment of ubiquitin to proteins governs a wide array of cellular processes, including cell division, DNA repair, endocytosis, cellular signaling, and protein quality control1,2,3,4. The sequential action of three enzymes — E1 ubiquitin-activating enzyme, E2 ubiquitin-conjugating enzyme, and E3 ubiquitin ligase — results in attachment of ubiquitin to a substrate protein, usually via an amide (isopeptide) bond that links the C-terminal carboxyl group of ubiquitin with a lysine side chain(s) of the protein substrate (Figure 1)5,6. Ubiquitin itself possesses seven lysine residues (Lys6, 11, 27, 29, 33, 48, and 63), enabling it to form ubiquitin polymers; chains with different linkages signal different functional outcomes for the tagged proteins3,7,8,9. All seven lysines are used for chain formation in vivo, as is the N-terminal α-amino group, the latter leading to 'linear' ubiquitin chains. Chains can be homopolymeric or, less commonly, of mixed linkage. Branched ubiquitin chains, which use different lysines of a single ubiquitin for chain extension, can also form10,11,12,13.

Figure 1
figure 1

The ubiquitin (Ub) modification cycle.

Following the discovery of ubiquitin, related proteins called ubiquitin-like proteins (Ubls) were also identified; these proteins share a common core architecture called the β-grasp fold14. As with ubiquitin conjugation, a similar cascade of E1, E2 and E3 enzymes, specific to each Ubl, is utilized to covalently link Ubl and substrate15. Besides ubiquitin, the most frequently employed Ubl is SUMO (small ubiquitin-related modifier). In humans, SUMO is present as four isoforms, each encoded by a distinct gene16. Interestingly, hybrid SUMO-ubiquitin chains have also been described17. The variety of covalently ligated ubiquitin/Ubls and their polymeric forms creates significant challenges for the deconjugating enzymes in identifying and cleaving the appropriate substrates.

Deubiquitinating enzymes (DUBs) and Ubl-specific proteases (ULPs) catalyze the cleavage of ubiquitin or Ubls from substrate proteins and also process C-terminally extended precursor forms of these modifiers (Figure 1)18. DUBs and ULPs can be classified into one of two mechanistic classes: (1) thiol proteases, which are mechanistically and structurally related to the well-studied cysteine protease papain and rely on a nucleophilic cysteine in the active site for catalysis, and (2) metalloproteases, which coordinate a Zn2+ ion in the active site and use a nucleophilic water ligated to the metal to hydrolyze the isopeptide linkage19. The JAB1/MPN/MOV34 metalloproteases (JAMMs) include multiple DUBs and at least one ULP (a deneddylase, i.e., an enzyme that cleaves the Nedd8 Ubl from substrates). The thiol protease class includes the great majority of DUBs and ULPs. They are categorized into distinct families according to the structure of their catalytic domains. As with the JAMMs, the same family may have members that cleave ubiquitin, Ubls, or both. It remains difficult to predict these specificities based on enzyme primary sequence alone.

The four known eukaryotic thiol protease DUB families are the ubiquitin C-terminal hydrolases (UCHs), ubiquitin-specific proteases (USPs), ovarian tumor (OTU) proteases, and Machado-Josephin-domain proteases20,21. As will be discussed later, bacterial and viral thiol proteases outside of these families have been found to have DUB activity; it is noteworthy that these bacterial and viral ubiquitin-cleaving enzymes all function within eukaryotic cells. Similar to the DUBs, there are multiple SUMO-specific thiol protease families: the SUMO/sentrin-specific proteases (SENPs; which are related to yeast Ulp1), desumoylating isopeptidases (DESIs), and ubiquitin-specific protease-like 1 (USPL1)16. SUMO proteases of the SENP/ULP class are the most phylogenetically widespread.

Nearly 100 DUBs have been identified in humans. Since the capacity for ubiquitin deconjugation in cells is extremely high, these activities must be kept under tight control. Accordingly, DUB (and ULP) activity is regulated by a host of factors. In many cases, DUBs and ULPs have domains extending from either end of their catalytic domains that help regulate their activity. Such regulation may occur by facilitating enzyme-cofactor interactions, targeting enzymes to specific cellular compartments, maintaining enzymes in an auto-inhibited state, and/or altering their affinity for substrate20.

DUBs and ULPs display high specificity toward their substrates. They can differentiate between ubiquitin and Ubls, show preferences for particular polymeric forms of ubiquitin or Ubls, and distinguish among distinct conjugated substrates. The molecular basis of this specificity is the subject of the present review. We focus on revelations about specificity derived from recent structural studies. We also discuss non-eukaryotic DUBs and ULPs and their sometimes surprising specificities.

Ubiquitin and Ubl recognition by DUBs and ULPs

Modifier properties

The modifier proteins sport surface features that aid in their recognition by DUBs and ULPs. Although ubiquitin is a small, compact protein with a rigid core, it contains several important motifs for interactions with other proteins. The two motifs most commonly observed contacting DUBs are the so-called Ile44 patch (comprising Ile44, Leu8, Val70 and His68) and the Ile36 patch (Ile36, Leu71 and Leu73)22,23. Other protein-binding elements utilized by ubiquitin are the Phe4 patch (Gln2, Phe4 and Thr14), the TEK box (Lys6, Lys11, Thr12, Thr14 and Glu34), and the Asp58 patch (Arg54, Thr55, Ser57 and Asp58)3.

Sequence alignment of ubiquitin and the Ubls SUMO, Nedd8, ISG15 and Fat10 reveals that, aside from Nedd8, the other Ubls exhibit very little sequence conservation with ubiquitin in these motifs (Figure 2). Both the Ile36 and Ile44 patches are conserved in Nedd8, and the Nedd8 Ile44 patch binds directly to the deneddylase Den1/SENP824,25. However, the Ile44 patch is not always a key DUB/ULP contact spot. SdeA, a bacterial DUB, does not engage in any interactions with this patch when bound to a ubiquitin suicide substrate26. Likewise, new crystal structures of the USP CYLD bound to either Met1- or Lys63-linked diubiquitin revealed that the Ile44 patch of the distal ubiquitin (the one with its C-terminal carboxyl in amide linkage) has no direct interactions with the enzyme27. A similar observation was made previously for USP7 bound to ubiquitin aldehyde28.

Figure 2
figure 2

Sequence alignment of ubiquitin and Ubls (A) and ubiquitin surface elements important for protein binding (B). The various patches that ubiquitin (PDB code: 1UBQ31) uses to bind other proteins are highlighted as follows: the C-terminal LRLRGG motif is colored green, the Ile44 patch is red, the Ile36 patch is blue, the Phe4 patch is cyan, the TEK box is orange, the Asp58 patch is purple, and Ser65, which is phosphorylated by the kinase PINK1, is black.

Perhaps the most significant feature of ubiquitin and Ubls for cognate protease recognition is their flexible C-terminal tail29. The ubiquitin/Ubl tail is stabilized by several interactions in the protease active site cleft. DUB and ULP deconjugation of their cognate modifier proteins depends heavily on these C-terminal residues (labeled P6-P1 in Figure 2). For example, a single amino-acid exchange of the Ala at position P5 in Nedd8 to an Arg, which is the ubiquitin P5 residue, markedly decreased the affinity of Den1 for the mutated Ubl, likely due to steric interference with the ULP25. Conversely, for USP21, which exhibits dual specificity for ubiquitin and ISG15, Arg72 at P5 of ubiquitin is stabilized through formation of a salt bridge with an invariant Glu in USP2130. This Arg residue is present in the sequence of ISG15 but not Nedd8. Were Arg72 not engaged in DUB binding, one might predict that discrimination against Nedd8 would not be seen; this is borne out for the prokaryotic DUB SdeA26.

Other ubiquitin/Ubl features also contribute to binding of their cognate deconjugating enzymes. In the β-grasp fold, a central α-helix is cradled by a curved β-sheet (Figure 3)31. Most co-crystal structures of ubiquitin-DUB complexes reveal that the two-residue loop (Leu8-Thr9) that connects the β1 and β2 strands nestles into a binding pocket in DUBs well away from the active site28,32,33. Binding by this loop in Nedd8 also generates several key van der Waals contacts with the ULP Den124,25. By contrast, inspection of co-crystal structures of SUMO with SUMO proteases indicates that the SUMO β1-β2 loop has little direct involvement in binding to the proteases34,35,36. Interestingly, ubiquitin co-crystal structures with UCH and USP family DUBs suggest that this ubiquitin loop adopts UCH-specific and USP-specific conformations37.

Figure 3
figure 3

Structural comparison depicting the conserved β-grasp fold of ubiquitin and Ubl proteins (PDB codes: 1UBQ31, 1NDD146, 1WM3147, 1Z2M148, and 2KWC149). ISG15 has tandem ubiquitin folds. The C-terminal glycine of ubiquitin and Ubls is colored red, except for SUMO2 because it was not ordered in the crystal structure.

A newly identified feature of ubiquitin, that it can be phosphorylated at Ser65 by PINK138,39, provides an added element to consider in substrate specificity. Although yeast lacks an ortholog for PINK1, phosphorylation of ubiquitin Ser65 has also been shown in yeast and may serve a conserved regulatory function40. Phosphorylated ubiquitin can adopt two states in solution; one may limit accessibility of its tail41. Most DUBs tested so far have only weak activity toward ubiquitin chains composed of phosphorylated ubiquitin41,42. Ser65 is conserved in both ISG15 and Nedd8, suggesting that they may also be susceptible to phosphorylation.

Recognition of polyubiquitin chains

For depolymerization of polyubiquitinated substrates, an important structural feature is the isopeptide linkage between ubiquitin monomers. The isopeptide bond linking the proximal (lysine-donating) and distal (lysine-accepting) ubiquitin must be stabilized in the active site of the DUB. Comparison of Lys63- and Lys48-linked polyubiquitin chains shows that Lys63 linkages adopt an extended 'beads-on-a-string' conformation in which the only interaction between the ubiquitin moieties is through the isopeptide linkage43,44. This conformation is shared by Met1-linked diubiquitin45. By contrast, the ubiquitin moieties in Lys48-linked polyubiquitin pack closely in a closed conformation through interactions of their Ile44 patches46,47. Similarly, Lys11- and Lys6-linked diubiquitins also adopt more compact conformations, although the Ile44 patch is exposed on both ubiquitins in Lys11-linked diubiquitin and one of the ubiquitins in Lys6-linked diubiquitin48,49,50. Recent structural studies revealed that K33-linked ubiquitin chains adopt open and closed conformations in triubiquitin and diubiquitin, respectively51. Nevertheless, polyubiquitin chains are dynamic in solution, and a chain can adopt both closed and open conformations52,53.

Compact chains likely cannot be recognized by DUBs unless they undergo significant conformational changes to expose the isopeptide bond. In the case of Lys48-linkages, this remodeling probably also involves exposure of the Ile44 patch so that it is free to interact with the DUB. To our knowledge, no DUB bound to a Lys48-linked diubiquitin has been crystallized. Specificity of DUBs toward different ubiquitin linkages varies among family members. Many JAMM proteases, such as AMSH, are only active against Lys63-linked chains54,55, primarily due to interactions with the proximal ubiquitin moiety33,56. On the other hand, most DUBs belonging to the USP family will hydrolyze many chain types, albeit with different preferences20,57. The OTU DUBs show a striking range of chain linkage preferences. Structural analyses reveal that proper positioning of the proximal ubiquitin on the OTU DUB surface is paramount for this selectivity58 and that the length of the ubiquitin polymer also contributes to specificity58,59.

Ubiquitin/Ubl-induced active-site rearrangement

Thiol protease DUBs and ULPs utilize variations of the classic papain-like Cys-His-Asp/Asn catalytic triad to catalyze hydrolysis of peptide or isopeptide bonds18. To facilitate activation of the nucleophilic Cys by His (serving as the general base), these residues must be precisely oriented in the active site with a His-Cys hydrogen-bond distance within 3.8 Å. However, a common theme emerging from structural studies of these proteases is that the catalytic residues are often in unproductive orientations in the absence of substrate. This misalignment involves displacement of the Cys, the His, or both28,60,61,62,63,64. Substrate binding causes the active site residues to rearrange into a catalytically competent orientation32,63,65,66,67. The earliest example of such a substrate-induced realignment of active site residues was the DUB USP7, which has been discussed extensively20,28,68.

A striking example of how remote substrate binding induces realignment of the active-site His to a catalytically productive orientation can be seen with the free and ubiquitin-bound forms of UCHL1. Defects in UCHL1, the smallest member of the UCH family of DUBs, have been linked to a variety of diseases including Parkinson's disease and various cancers69,70. In the substrate-free form, the active site His has rotated away from the nucleophilic Cys into a catalytically unproductive orientation, and the two residues are separated by 7.7 Å60 (Figure 4A). Docking of the β1-β2 hairpin loop of ubiquitin into a surface-exposed hydrophobic pocket of UCHL1 located 17 Å from the catalytic triad elicits a cascade of conformational changes in highly conserved Phe residues that bridge the distal and active sites. To accommodate the β-hairpin loop of ubiquitin, UCHL1-Phe213 swings inward into the site normally occupied by Phe53, which, in turn, forces the aromatic side chain of Phe53 to rotate into a steric conflict with the catalytic His161. This causes the His side chain to flip towards the nucleophilic Cys such that the Nδ1 atom of the imidazole ring is now 3.9 Å away from the sulfur atom of Cys9065. These results suggest that interactions with the substrate outside of the DUB active site can contribute to discrimination of ubiquitin from Ubl modifiers (Figure 2A).

Figure 4
figure 4

The active site residues of unliganded Ubl proteases (shown in white) are often misaligned, but undergo restructuring into productive conformations upon substrate binding (residues from ubiquitin-bound DUBs are colored green). (A) Binding of the β1-β2 hairpin loop of ubiquitin (ubiquitin is colored blue while the loop is magenta) into a hydrophobic pocket on the surface of UCHL1 triggers a series of conformational changes in aromatic side chains, forcing His161 to adopt a productive orientation (PDB codes: 2ETL60, 3KW565). (B) The apo structure of OTULIN reveals that H339 is stabilized in an unproductive form by a salt bridge with a neighboring Asp residue (PDB code: 3ZNV63). Binding of Met1-linked diubiquitin in the active site of OTULIN imposes a steric clash (magenta dashes) with M1 and E16 of ubiquitin, forcing H339 to flip into the active site (PDB code: 3ZNZ63). (C) Binding of Met1-linked diubiquitin leads to stabilization of H801 in the active site via Q2 of the proximal ubiquitin (PDB codes: 2VHF73, 3WXE27). However, the Gly-Gly motif (magenta) is displaced 8 Å from the active site due to E674 from a loop that is stabilized by substrate binding, which appears to block the active site.

OTULIN (Fam105b, gumby) is a member of the OTU family of DUBs known to play roles in innate immune and Wnt signaling, as well as to bind and regulate the linear ubiquitin assembly complex (LUBAC). OTULIN harbors a Cys-His-Asn triad and only cleaves Met1-linked polymers63,71,72. A high-resolution crystal structure of the apo form of OTULIN revealed mixed occupancy of the His and Cys catalytic residues in which 70% of the time, the residues are misaligned in an auto-inhibited state. A local non-catalytic Asp residue plays an inhibitory role by interacting with the catalytic His and pulling it out of hydrogen-bonding distance with the catalytic Cys (Figure 4B). Substrate-assisted realignment of the OTULIN active site into a productive form is facilitated by two unique structural features of Met1-linked diubiquitin63,72. First, the unproductive orientation of the His residue would sterically clash with the Met1 carbonyl group in the proximal ubiquitin moiety of bound linear diubiquitin. Binding of this diubiquitin forces the His residue to assume a productive orientation (Figure 4B). Isopeptide ubiquitin linkages would not induce such a change. A second key to substrate-induced activation is the positioning of Glu16 of the proximal ubiquitin, which both expels the auto-inhibitory Asp from contact with the active site His and hydrogen bonds with the Asn of the catalytic triad to align it within hydrogen-bonding distance of the His residue63, thus forming a catalytically competent active site.

CYLD is a tumor suppressor that belongs to the USP family of DUBs and displays specificity for hydrolysis of both Met1- and Lys63-linked ubiquitin chains45,73. It was crystallized in its ligand-free form with two molecules in the asymmetric unit, in which the active site residues in one molecule were oriented in a productive conformation, while in the other molecule, the side chain of the catalytic His residue was rotated away from the Cys nucleophile (Figure 4C)73. Recent structural characterization of CYLD bound to Lys63-linked diubiquitin in the catalytic state and Met1-linked diubiquitin in both the pre-catalytic and catalytic states revealed that the His side chain is arranged in a catalytically competent orientation with either diubiquitin27. However, no evidence of substrate-induced conformational crosstalk was seen. His is the first residue of a β-strand that follows a loop; thus, it is feasible that mobility of the loop enables the His to sample various conformations in the substrate-free form of CYLD. In both the Met1- and Lys63-linked diubiquitin complexes, the CYLD catalytic His is stabilized by hydrogen-bonding with Gln2 of the proximal ubiquitin, promoting the active conformation.

Intriguingly, an added layer of protection from unwanted cleavage is seen in the pre-catalytic binding state of CYLD with Met1-linked diubiquitin27. The scissile peptide bond of the dimer was offset from the active site Cys by 8 Å, and a nearby loop that had been disordered in the unliganded CYLD structure was stabilized in the Met1- (and Lys63-) linked diubiquitin complex. Most interestingly, a Glu side chain in the loop apparently lies sandwiched between the scissile bond and the nucleophilic Cys, preventing catalytic activation (Figure 4C). Mutation of this residue to Gln led to its displacement from the active site, permitting proper orientation of the scissile peptide bond.

Substrate-induced rearrangement of catalytic residues into productive positions has also been observed in several SENP SUMO protease family members35,74,75,76. Misaligned active site residues have yet to be observed in Nedd8- or ISG15-cleaving enzymes, but few structures have been examined to date. Examples of such misalignment may well be found in all types of DUBs and ULPs.

Rearrangement of active site residues prevents oxidation of catalytic cysteines

One rationale for why DUBs and ULPs might have evolved to adopt misaligned active sites in their substrate-free forms would be to prevent spurious activity against cellular proteins, while ensuring specificity toward the correct ubiquitin- or Ubl-linked conjugates18. Misalignment of catalytic residues may also limit oxidation of active-site cysteines77,78,79. Reactive oxygen species accumulate in cells in response to various types of stress, including UV, heat, and low levels of NADPH and glutathione80. Recently, proteases from the DUB and SENP families were found to be highly susceptible to oxidation of their catalytic cysteines, leading to accumulation of ubiquitin or SUMO conjugates in vivo77,79,81,82. The Atg4 protease, which cleaves the Atg8 Ubl from phosphatidylethanolamine during autophagy, is also sensitive to oxidization83.

Modification of a cysteine residue to sulfenic acid (-SOH) can be reversed in the presence of reducing agents while formation of Cys-sulfinic acid (-SO2) and Cys-sulfonic acid (-SO3) is irreversible. The catalytic cysteines in SENPs can form intermolecular disulfide bonds, which would protect the Cys sulfhydryl from irreversible oxidation79. A20, a tumor suppressor that has both E3 ligase and OTU DUB domains, may use a distinct mechanism to protect the OTU active-site sulfhydryl. A crystal structure of A20 with its catalytic cysteine modified to sulfenic acid revealed that Cys-SOH may limit further oxidation by engaging in several hydrogen bonds with a nearby loop77.

The sulfhydryl group of most cysteine residues is less prone to oxidation since it maintains a pKa of 8.5. However, in a catalytically competent active site, nucleophilic cysteines are more susceptible to oxidation because the His residue, acting as a general base, can deprotonate the Cys. The resulting thiolate anion can be readily oxidized (or can attack the scissile bond of a substrate). Thus, it seems plausible that by requiring realignment of active site residues in the presence of substrate, the reactive thiolate anion only forms when it can be used productively to catalyze cleavage of ubiquitin- or Ubl-linked conjugates.

Active site loops that restrict substrates based on size

An active-site crossover loop is a key structural element unique to the UCH family of DUBs and has been proposed to play a critical role in selection of substrates68. This loop joins an α-helix and β-sheet on opposite sides of the catalytic groove and ranges from 11 to 21 residues in length84. The UCHL1 crossover loop, the shortest known, assumes a somewhat rigid and open structure in the absence of substrate60 that can accommodate the C-terminus of a ubiquitinated substrate65. In contrast, crystal structures of apo UCHL3 (Figure 5A) and UCHL5 (UCH37) reveal that the loop is largely disordered, suggesting that it can sample a variety of conformations in solution62,64,85,86. Binding of ubiquitin leads to apparent stabilization of the crossover loop, as it was seen straddling the active site in co-crystal structures of ubiquitin aldehyde bound to UCHL3 (Figure 5A and 5B) and the yeast UCH Yuh162,87. The orientation of the ubiquitin C-terminal tail beneath the crossover loop indicates that a ubiquitinated substrate must be threaded through the loop, but the loop is too small to accommodate large substrates, including a ubiquitin dimer60,87. Accordingly, UCH enzymes on their own are largely incapable of disassembling ubiquitin dimers in vitro55. However, association of cellular cofactors, such as the proteasome ubiquitin receptor Rpn13 with UCHL5 or ASXL1 with BAP1, stimulates isopeptidase activity88,89.

Figure 5
figure 5

The active site crossover loop of UCH family enzymes is dynamic and restricts substrate size. In its substrate-free form, the crossover loop for UCHL3 is unstructured (A), but becomes ordered (green) when ubiquitin (colored blue) binds in the active site (B) (PDB codes: 1UCH150, 1XD387). Surprisingly, even in the presence of ubiquitin, the active site crossover loop of UCHL5 remained disordered (PDB code: 4IG732) (C). Binding of its cofactor, Rpn13, to UCHL5's C-terminal UCH37-like domain (ULD colored purple) (D), stabilizes a portion of the crossover loop (green; PDB code: 4WLR67). (E) Sequence alignment of the crossover loops indicates very little conservation.

Due to the steric constraints imposed by these crossover loops, especially in UCHL1 and UCHL3, which possess the shortest loops, it has been proposed that most substrates for this family of DUBs will have small leaving groups, including short precursor peptides, at the C-terminus of ubiquitin90,91. Indeed, extending or shortening the crossover loop can alter UCH specificity; at the same time, these loops are poorly conserved among UCH family members (Figure 5E). These results suggest that the size of the crossover loop, and not its sequence, is usually its key contribution to substrate selectivity84,91.

Surprisingly, even in the presence of ubiquitin, the crossover loop was found to be entirely unstructured in a UCHL5 ortholog from Trichinella spiralis (Figure 5C)32. Based on this observation, it was proposed that UCHL5 might require its proteasomal cofactor, Rpn13, to fully stabilize the loop. In fact, crystal structures of a ternary complex of UCHL5 bound to Rpn13 and ubiquitin66,67 revealed that a segment of the crossover loop (Met148 and Phe149) interacts directly with Rpn13, while the rest remains disordered (Figure 5D). These interactions pull a portion of the loop away from the active site, presumably opening it up for optimal binding of substrates. Rpn13 also binds to the C-terminal UCH37-like domain (ULD) domain of the UCH, locking the ULD into a favorable conformation for ubiquitin binding. A recent study suggests that ASXL1 might activate Bap1 in a similar manner92.

Insertions in catalytic domains contribute to substrate specificity

The DUB and ULP enzyme families are defined by conserved sequence features of their catalytic domains. However, many bear unique terminal extensions or catalytic domain insertions. Here we will discuss several examples of how these insertions can contribute to the substrate specificity of individual enzymes in a family.

Insertions in JAMM metalloproteases

The JAMM family of metalloproteases share a conserved MPN core consisting of an 8-stranded β-sheet sandwiched between two α-helices, resembling a partially curled β barrel93,94. Although sequence conservation is poor among family members, superposition of their MPN domains points toward conservation of the structural core, with small differences in the length of secondary structure elements95. Typically, JAMM proteins are found as part of complexes of at least two proteins, in which one subunit has an MPN+ domain (the MPN+ domain bears all the residues required for active site Zn2+ coordination — the “JAMM motif”) and the other has an inactive MPN domain (JAMM motif absent). A key to substrate specificity in this class of enzymes originates from two divergent insertion segments, Ins-1 and Ins-2, which are unique to each family member (Figure 6A)94,96,97,98,99,100,101.

Figure 6
figure 6

Insertions within the catalytic domain of ULPs also contribute to substrate specificity. (A) Ins-1 (black) and Ins-2 (cyan) segments of AMSH-LP are shown within the JAMM core of AMSH-LP (PDB code: 2ZNR33), compared to the protease AfJAMM (PDB code: 1R5X93). (B) E104 of the Ins-1 segment of CSN5, colored black, stabilizes the Ins-1 loop in an inactive conformation (PDB code: 4D1099). (C) N275 of a nearby proteasome lid subunit, Rpn5 (yellow), appears to inhibit Rpn11 by serving as the fourth ligand in a water-mediated interaction with the Rpn11 active site zinc (PDB code: 3JCK116). This locks the Ins-1 segment of Rpn11 in a closed conformation. (D) CYLD specificity for Met1- and Lys63-linked polyubiquitin is linked to truncations of various structural elements within the catalytic domain compared to other USP family members (PDB codes: 3WXF27, 1NB828). Shown in red are structural features of USP7 that are absent in CYLD. These truncations and an insertion unique to CYLD (purple) shift polyubiquitin recognition from the distal ubiquitin (blue) to the proximal ubiquitin (pink). (E) A chimeric construct of SENP2 containing the Loop-1 insertion of SENP6 (purple; PDB code: 3ZO5120) illustrates SENP6/7 preference for SUMO2/3 (SUMO2 is colored blue), as the loop binds to a negative patch of residues (red) not conserved in SUMO1.

AMSH is a JAMM protease involved in endosomal sorting of ubiquitinated cell surface receptors102. Crystal structures of AMSH-like protein (AMSH-LP) and an AMSH ortholog from S. pombe bound to Lys63-linked diubiquitin revealed that the molecular basis for AMSH's exquisite specificity for Lys63-linked chains arises largely from the Ins-2 segment and its binding to the proximal ubiquitin moiety33,103.

The C-terminal tail of the distal ubiquitin moiety binds the active site cleft in an extended β-strand conformation and is stabilized by extensive hydrogen bonding with a β-strand from the Ins-1 segment. These interactions between distal ubiquitin and Ins-1 position the scissile isopeptide bond for hydrolysis. The Ins-2 segment of AMSH is a loop that forms a flap structure near the active site and is stabilized by coordination of a second, non-catalytic Zn2+ ion. Ins-2 dictates substrate specificity for Lys63-linked ubiquitin chains by hydrogen bonding with Gln62 and Glu64 of the proximal ubiquitin moiety while a conserved Phe from the flap region appears to clamp down on the isopeptide bond of diubiquitin33,103. Mutation of the conserved Phe led to reduced selectivity for Lys63-linked diubiquitin, possibly due to increased flexibility of the Ins-2 loop104. Binding to its cofactor STAM2 stimulates AMSH deubiquitinating activity. A UIM-SH3 domain from STAM2 binds to both AMSH and the Ile44 patch on the proximal ubiquitin, serving to further stabilize the position of the proximal ubiquitin moiety on the DUB102,105,106.

CSN5 is a JAMM protease that is a part of a protein complex known as the COP9 signalosome (CSN)107. The main role of CSN5 is to deneddylate Cullin-Ring E3 ligases (CRLs)108,109; however, the activity of CSN5 is severely reduced when it is not part of the CSN110,111. The Ins-1 region of CSN5 was recently shown to occlude the active site, thus preventing Nedd8 binding, and its high crystallographic B factors point to a loop structure that can assume multiple conformations96. The crystal structure of CSN showed that like Rpn11 (see below), CSN5 is seemingly auto-inhibited by altering tetrahedral coordination of the active site Zn2+ ion99. Glu104 of the Ins-1 segment occupies the fourth coordination spot of Zn2+, dispelling the nucleophilic water from the active site and stabilizing CSN5 in an inactive conformation (Figure 6B). Thus, activation of CSN5 deneddylase activity likely requires a major conformational change of the Ins-1 segment, which is probably induced by binding of a neddylated CRL99.

The proteasomal DUB Rpn11

Another JAMM family DUB, Rpn11, is one of nine subunits of the proteasome 'lid', a subcomplex of the regulatory particle (RP) of the proteasome112. Substrates that have been earmarked for degradation by the proteasome are deubiquitinated en bloc by Rpn11 in an ATP-dependent manner89,94. Rpn11 is most active as part of the full proteasome; it is inactive when purified on its own or in the isolated lid subcomplex113. Rpn8, which possesses an inactive MPN domain, forms a heterodimer with Rpn11, and this complex displays residual DUB activity in vitro114. Recent crystal structures of the Rpn8-Rpn11 heterodimer revealed that the Ins-2 segment of Rpn11 has a function entirely different from that of the Ins-2 loop of AMSH115,116. In the Rpn8-Rpn11 structures, the Ins-2 loop was disordered; however, modeling the crystal structure into the averaged electron density used for a proteasome cryoEM reconstruction indicated that the Ins-2 loop interacts with Rpn2, a subunit belonging to the subcomplex of the RP known as the 'base', to aid in the positioning of Rpn11. The ability of Rpn11 to cleave all seven polyubiqiuitin lysine linkages may arise from the absence of structural elements that contact the proximal ubiquitin116.

The structural studies also showed that Rpn11 Ins-1 differs vastly in structure from Ins-1 of AMSH-LP and appears to occlude the Rpn11 active site. However, instead of augmenting DUB activity, mutations in the Ins-1 loop impair it, suggesting that the loop is not inhibitory, but is in fact necessary for catalysis. High crystallographic B-factors suggest that in the absence of substrates, the Ins-1 loop is dynamic. It was proposed that once an ubiquitinated substrate enters the catalytic site, Ins-1 will clamp down over the substrate to help position it for cleavage116.

A recent 3.5 Å cryoEM structure of the proteasome lid offers further insight into what keeps Rpn11 inactive when outside of the full proteasome117. An α-helix of the lid subunit Rpn5 sterically blocks the top of the Rpn11 catalytic cleft, and several residues from the N-terminal end of the α-helix directly interact with loops surrounding the catalytic Zn2+ ion. Strikingly, Asn275 from Rpn5 appears to insert itself near the Rpn11 active site and stabilize tetrahedral coordination of its Zn2+ through a bridging interaction with a coordinated water molecule (Figure 6C). These interactions may lock the Ins-1 loop in a closed conformation, occluding the active site. When the lid is incorporated into the proteasome, conformational changes in the Rpn11-Rpn8 heterodimer are proposed to lead to its activation through distortion of the Rpn5-Rpn11 and Rpn9-Rpn8 contact sites.

Insertions in the catalytic domains of USP DUBs and SENP ULPs

Like JAMM domains, USP catalytic domains, which range in size from 300 to 900 residues, also often have insertions. A detailed analysis of USP domain architecture revealed that within the conserved USP core there are five potential loop locations for insertions118. In most cases, the insertions are predicted to fold into independent domains, and depending on the location, may influence DUB activity118. A striking example of how an insertion in the catalytic domain affects substrate specificity comes again from CYLD, a USP that deviates from the canonical USP fold due to several truncations and an insertion27,73. As mentioned earlier, CYLD is highly specific for Met1 and Lys63 ubiquitin linkages. Truncation of structural elements typically involved in distal ubiquitin binding in other USPs reduces the affinity of CYLD for ubiquitin (Figure 6D). At the same time, a unique insertion segment for CYLD, the β9-β10 strands, interacts with the Phe4 hydrophobic patch of the proximal ubiquitin (Figure 6D), an interaction specific to Met1 and Lys63 linkages27. Deletion of the β9-β10 segment diminishes DUB activity against Lys63 ubiquitin linkages, underscoring its role in substrate specificity73. Furthermore, Glu16 of the proximal ubiquitin moiety occupies a CYLD binding pocket created as a result of truncation of the β6-β7 loop. These modifications to the CYLD USP domain architecture contribute to specificity by shifting the burden of ubiquitin recognition from the distal ubiquitin to the proximal ubiquitin molecule.

The mammalian SUMO proteases SENP6 and SENP7 display high specificity for the SUMO2/3 isoforms (which are nearly identical) over the SUMO1 protein. They are characterized by the presence of four loop insertions within their (poorly conserved) catalytic domains. The crystal structure of SENP7 revealed that the Loop-1 and Loop-2 insertions are found on the protease surface at positions that would likely contact SUMO119. Deletion of Loop-1, but not Loop-2, impaired SENP7 SUMO2/3-cleaving activity, suggesting that Loop-1 plays a key role in determining the specificity of SENP7 for SUMO2/3119. The crystal structure of a chimeric SENP2 fusion harboring the Loop-1 segment from SENP6 bound to SUMO2 revealed that Loop-1, an eight-residue element, extends the binding interface between the protease and SUMO2. A negative patch of amino acids unique to SUMO2 (Asn68, Asp71 and Glu77) directly contacts the SENP6 Loop-1 (Figure 6E)120. These residues are substituted to Ala, His and Gly in SUMO1, which implicates both the Loop-1 insertion in SENP6/7 and the negative patch of SUMO2 as key determinants in SUMO isoform specificity.

Substrate specificity of non-eukaryotic DUBs and ULPs

Although ubiquitin/Ubl modification systems were originally thought to be unique to eukaryotes, antecedents of all components of these systems have now been detected in archaea and bacteria14,121. Among the proteases that (putatively) cleave Ubls from prokaryotic proteins, the JAMM proteases are the most common based on bioinformatic analyses121. Viruses also encode select components of these systems122. Interestingly, a number of DUBs and ULPs have been identified in pathogenic bacteria and bacteria that reside within eukaryotic cells123. These enzymes are often most similar in sequence to eukaryotic enzymes, suggesting possible horizontal gene transfer26,124,125, although in some cases their ability to cleave ubiquitin or Ubls may have evolved independently from a primordial protease core126. No Ubl-type modifier systems exist in these bacteria; instead these ubiquitin- or Ubl-specific proteases are injected as effector proteins through a specialized secretion apparatus into the eukaryotic host127,128,129.

ULPs and DUBs have been described in the genomes of obligate intracellular and pathogenic gram-negative strains of bacteria. These enzymes include the ChlaDUBs and ChlaOTU of Chlamydia128,130,131, the putative wPa_0283 ULP of Wolbachia124, a putative USP from Cardinium132, SdeA of Legionella26,133, SseL of Salmonella134, XopD of Xanthomonas129,135, and ElaD of E. coli125. Certain viruses also encode ULPs or DUBs. Examples include DNA viruses of the Adenoviridae, Poxviridae and Herpesviridae families122,136, and RNA viruses such as Crimean Congo Hemorrhagic Fever Virus (CCHFV) and Turnip Yellow Mosaic Virus (TYMV), which have OTU proteases137.

These bacterial and viral proteases have provided useful models for predicting the ubiquitin and Ubl specificity of DUBs and ULPs. Surprisingly, many turn out to have specificities different from the eukaryotic enzymes to which they are most closely related. For example, the C48 Ulp1-like family of enzymes that includes yeast Ulp1, ElaD, and SdeA are all expected to share similar folds and catalytic sites and thus might have been expected to cleave SUMO conjugates based on the original studies on Ulp1126. However, these enzymes have very different Ubl preferences: Ulp1 cleaves SUMO126,138; ElaD cleaves ubiquitin125; and SdeA cleaves both ubiquitin and Nedd826.

Viral ULPs generally cleave both viral precursor proteins and host Ubl-protein conjugates. A well-studied example is the adenovirus proteinase (AVP) which is a member of the C48 Ulp1-like protease family139. AVP cleaves specific virion precursor proteins to their mature forms140,141. Interestingly, AVP was identified in a screen for DUBs using an ubiquitin-aldehyde inhibitor and found to have DUB activity against both ubiquitin and ISG15122. Despite its closest eukaryotic counterparts being the SUMO-cleaving ULPs, AVP fails to cleave SUMO substrates. Its inability to cleave SUMO can be explained by the lack of specific sequence elements upstream of the core Ulp1 protease domain that contact SUMO74,126; however, its dual specificity for ubiquitin and ISG15 has not yet been rationalized122.

Viral OTU proteases, such as the enzyme from CCHFV, also cleave both ubiquitin and ISG15 conjugates142. This broadened specificity compared to eukaryotic OTU enzymes, which are ubiquitin-specific, was traced to the manner by which the CCHFV OTU protease positions its substrates. From crystallographic data, the bound ubiquitin and C-terminal Ubl domain of ISG15 were rotated 75° on the viral protease surface relative to ubiquitin bound to a yeast OTU protease. Different viral enzyme determinants direct ISG15 and ubiquitin binding, allowing its specificity to be manipulated by specific mutations143,144. A similar rotation was observed for substrate binding to an arterivirus OTU145. These results show that structural data from DUB/ULP-substrate complexes permit ubiquitin and Ubl specificity to be dissected experimentally.

An interesting example of the challenges in predicting Ubl protease substrate specificity comes from a study of XopD, a C48 Ulp1-like protease derived from a plant pathogen138. XopD is the only verified prokaryotic SUMO protease. The XopD crystal structure showed a similar fold to yeast Ulp1 from a Ulp1-SUMO co-crystal structure74, but the two enzymes have distinct specificities. C-terminal residues in particular plant SUMOs upstream of the cleavage site appear to guide XopD binding to these specific SUMO isoforms. By contrast, Ulp1 is far more promiscuous and can cleave not only these plant SUMOs, but also yeast and mammalian SUMO-fusion substrates. Among other differences, a conserved hydrophobic residue in Ulp1, Phe474, which is required for its function, has been replaced in XopD orthologs by glutamine. Similarly, this residue is altered in the Ulp1-like ChlaDUBs, but these enzymes hydrolyze ubiquitin rather than SUMO; it is also a glutamine in SENP8/Den1, which is a deneddylase138. Therefore, the Phe474-equivalent position in Ulp1-like enzymes may be important for dictating Ubl preferences, but it remains difficult to predict what those preferences are.

Concluding remarks

Although many advances have been made over the past 25 years in understanding the molecular basis of substrate specificity of DUBs and ULPs, many questions linger. An issue we sought to address at the outset of this review concerned whether Ubl or ubiquitin (chain) specificity for a particular DUB or ULP could be inferred solely from its amino acid sequence. This remains very challenging due, as outlined here, to the multiple mechanisms used for discrimination among substrates. Low sequence conservation among DUBs and ULPs adds to the difficulty. Further structural and biochemical studies of ubiquitin/Ubl proteases and their substrates should enhance our ability to predict the function and specificity of these proteases and to understand their detailed mechanisms.