Introduction

In order for cells and organisms to survive and adapt to different conditions, complex, tightly-controlled, and context-dependent regulation is crucial. Much of this regulation is achieved by post-translational modifications (PTMs) that can change the behavior of a protein (as well as corresponding modifications of other macromolecules such as DNA and RNA). One of the major regulatory modifications is the acetylation and deacetylation of histones, which is a main component of the histone code dictating chromatin organization and transcriptional activity1,2, mediated by lysine acetylases and deacetylases (HATs and HDACs, respectively). HDACs catalyze the removal of an acetyl group from the post-translational modification of acetyl-lysine in proteins.

Lysine deacetylase 6 (HDAC6) is a class IIB Zn2+ deacetylase and is the only HDAC to contain two deacetylase domains of distinct specificities. The first domain specifically deacetylates acetylated C-terminal lysine residues, while the second shows a particularly broad substrate selectivity3,4. There is evidence that HDAC6 catalyzes deacetylation of several proteins involved in a variety of cellular processes. Among them, HDAC6-mediated deacetylation of α-tubulin regulates microtubule stability and cell motility5. Another characterized substrate, cortactin, binds to deacetylated actin filaments and participates in the fusion of lysosomes and autophagosomes6. The enzyme also plays a role in protein folding by regulating the activity of the Hsp90 chaperone protein via deacetylation7,8. In addition, HDAC6 is an important player in innate immunity, regulating the detection of pathogen genomic material via deacetylation of retinoic acid inducible gene-I protein9,10,11. While the broad specificity of HDAC6 has been reported, a full understanding of the selectivity determinants is still lacking, as is a proper understanding of the underlying structural basis that makes this particular HDAC more promiscuous than others, such as HDAC812.

Beyond the few known substrates of HDAC6 mentioned above, substrate selectivity of human HDAC6 has been assessed at large scale in three key experimental studies3,13,14. Riester et al. used arrays of trimer peptides conjugated to 7-amino-4-methylcoumarin (AMC) and measured enzymatic activity by change in fluorescence13 (referred to as dataset D-3MER in this study). Schölz et al. applied specific inhibitors against several HDACs in cell lines and quantified the change in acetylated lysine sites by SILAC-MS (dataset D-SILAC)14. In both studies, the experiments were run with at least five different HDACs, and both reached the conclusion that HDAC6 was, by far, the most promiscuous among the examined HDACs. Finally, Kutil et al. assessed HDAC6 deacetylation of 13-mer peptides synthesized on an array, measuring activity with a mixture of anti-acetyl lysine antibodies (dataset D-13MER) and verified 20 substrate hits by high-performance liquid chromatography (HPLC; dataset D-HPLC)3. In the latter study, the predicted substrates were also compared with hits from other studies, finding minimal overlap3 (see Table S1 for a summary of all studies of HDAC6 substrates).

Many of the enzymes that add or remove PTMs act on short linear motifs (SLIMs) that are often exposed. Therefore, their substrate selectivity may be approximated by short peptides that cover the region15. Different types of prediction methods have been developed for finding putative substrates. Many sequence-based predictions find modification sites based on position-specific scoring matrices (PSSMs) or regular expressions16 that are derived from a large set of substrates, often selected by high-throughput approaches, such as those described above. However, these do not account for possible interdependencies between amino acids at different positions in the substrate, nor do they consider secondary structure that might be important for recognition. Machine Learning-based approaches can be used for these aims (e.g. Hidden Markov Models17 and naive Bayes18), but such approaches depend on considerable amounts of data19,20,21. Moreover, enzyme substrate patterns may not always adequately be depicted by a sequence-based description, like in the case of O-glycosylation22 or HIV-1 protease substrates23.

Structure-based methods can complement sequence-based methods, particularly in cases of non-canonical motifs22,24, as we have previously shown for PTM enzymes using the Rosetta FlexPepBind protocol12,25,26. This approach assumes that the ability of the substrate local peptide sequence to bind in a catalysis-competent conformation is a main determinant of enzyme selectivity, and thus the binding energy of such enzyme–substrate complex structures can be taken as a proxy for substrate activity. The accuracy of the calibrated protocol can be estimated by applying it to an independent test set. New substrates can then be identified by applying the calibrated protocol to candidate peptides with unknown activity.

In this study we utilized an accurate biochemical assay that measures acetate production following deacetylation27 to quantify the catalytic activity of HDAC6 for specific peptides and establish a gold standard set of peptide substrates. Based on these activities, we calibrated FlexPepBind to evaluate activity of potential substrates, as we have done in the past for other HDACs and PTM enzymes (e.g., HDAC8 and FTase12,26). Calibration revealed structural differences between HDAC6 and HDAC8 that form the basis of the considerable difference in selectivity of these two deacetylases. Application of this method to screen the acetylome identified novel potential regulatory mechanisms based on HDAC6-dependent regulation. In the end, the combination of our structure-based approach that is based on accurate, in vitro biochemical measures of substrate activity, with the previously reported large-scale approaches leads to a better understanding of HDAC6 substrate selectivity and biological function.

Results

Accurate measure of HDAC6 substrate activity using a biochemical enzymatic assay

We used an enzyme-coupled acetate detection assay, or simply the ‘acetate assay’ (see “Methods”), developed and applied previously12,27, to measure the catalytic efficiency (kcat/KM) of HDAC6. We measured catalysis of deacetylation of acetylated peptides with sequences taken from known substrate proteins, as well as a set of selected peptides with reported acetylation sites that were available to us from previous studies (Table 1). We used the second deacetylase domain (DD2) of HDAC6 for these experiments (from Danio rerio, which is more stable and has been shown to be a valid substitute for human HDAC64). To ensure an accurate determination of kcat/KM we measured HDAC6-catalyzed deacetylation at a minimum of four peptide concentrations, with at least two concentrations below KM (Fig. 1, squares). A total of 26 peptides which met these criteria formed our training set (D-TRAINING, Table 1). Additionally, we included 16 peptides where the value of kcat was measured accurately but the KM value was lower than the limit of detection for the acetate assay (~ 10–20 μM substrate, circles in Fig. 1) allowing only the determination of a lower limit for the value of kcat/KM (set D-CAPPED, see below and Table 1). The measured kcat/KM values span the range of three orders of magnitude.

Table 1 Activities of 42 selected peptides, as measured by the acetate assay. (A) Peptide measurements with precise kcat/KM values (D-TRAINING set). (B) Peptide substrates with kcat/KM values with lower limits: KM values were below the detection limit (D-CAPPED set). Together these two sets form the D-EXTENDED set.
Figure 1
figure 1

Dependence of deacetylation rate on substrate concentration for two representative peptides catalyzed by HDAC6, measured using the acetate assay. The initial velocity for each substrate concentration was determined from a linear regression of a time course consisting of a minimum of three timepoints, standard error is shown. The kinetic parameters are determined from a nonlinear least square fit of the Michaelis–Menten equation to the data and are listed in Table 1. Black squares: example of data which met the criteria to produce accurate kcat/KM values. Green circles: example of overfit data resulting in calculation of a lower limit for kcat/KM, due to the KM being lower than the detection limit of the assay.

To assess if measured activities on isolated peptides appropriately reflects HDAC6 activity on the natural substrate protein, we measured the activity of HDAC6 towards a full-length, singly-acetylated histone protein and a respective 13-mer peptide analog; catalytic efficiency was similar, albeit higher for the peptide: 14 × 104 M−1 s−1 vs. 8 × 104 M−1 s−1 for the full-length substrate, mostly due to an increased value of kcat (Table S2). Based on these results and the observation that enzymes that are catalytically efficient in the cell have a kcat/KM value of at least 104 M−1 s−1 28, we defined this value as a cutoff for distinguishing substrate and slow substrate or non-substrate peptides. In cells, HDAC6 selectivity will be determined by the relative reactivity, so substrates with high values of kcat/KM will out-compete those with low values if present at similar concentrations. Under such conditions, we can approximate slow substrates as non-substrates within the cellular context. Additional peptides tested that were not used in further analysis are listed in Supplementary Table S3 (i.e., peptides longer than six amino acids or peptides that could not be measured reliably).

Structure-based computational prediction can identify most HDAC6 substrates

To detect new potential substrates for HDAC6, we calibrated a structure-based protocol based on the kcat/KM values for HDAC6-catalyzed deacetylation of hexamer peptides (Table 1), using the FlexPepBind framework, previously applied to HDAC8 and FTase enzymes12,26. Here we provide a general outline of the calibration process. For details we refer to the “Methods” section.

First, the HDAC6 substrate with the highest kcat/KM value (EGKAcFVR, derived from prelamin A, see Table 1) was docked into the binding pocket of a solved DD2 HDAC6 structure (Protein Data Bank29, PDB ID: 6WSJ30) using Rosetta FlexPepDock refinement31. The top 5 best scoring models were selected as templates. Peptides from the D-TRAINING set were then threaded onto this template set and the peptide–receptor interface was minimized (see “Methods”). For each peptide, the top-scoring model was used as an estimate for its ability to bind to HDAC6 in a catalysis-competent conformation (reinforced by constraints, see “Methods”). The performance of the protocol was evaluated based on the calculated binary distinction (area under the curve (AUC) values) and Spearman’s ⍴ correlation between experimental values and Rosetta scores. The runs were performed with or without receptor backbone minimization, both at the refinement (ref vs. refmin) and threading steps (thread vs. threadmin) (i.e., four different protocols were tested, see summary in Table 2). Throughout this study, scoring was performed using the Rosetta reweighted score32.

Table 2 FlexPepBind protocols evaluated in this study. (A) Performance metrics of the protocols on the D-TRAINING dataset (n = 26) at the loose cutoffs. Values for the D-EXTENDED dataset (including both D-TRAINING and D-CAPPED peptides, n = 26 + 16) are indicated in parentheses. Since the activities of the D-CAPPED dataset are not exact, no correlation was calculated for the merged dataset. Results are shown for the 6WJS template. (B) Performance metrics for different crystal structures used as templates, using the refmin_thread protocol.

The best performance (AUC = 0.78, ⍴ = − 0.66, Fig. 2) was achieved when the docking step was performed with backbone minimization, but the subsequent threading step did not include backbone minimization (i.e., refmin_thread). We defined a loose (reweighted score: − 1105) and a strict cutoff (reweighted score: − 1118), to allow for a maximum of 0 and 1 false positives, respectively. Performance was then evaluated on a dataset for which the exact activity values could not be measured but was estimated to consist of substrates only (i.e., the D-CAPPED set, see Table 1): reassuringly, 12 of the 16 substrates passed the strict threshold (and all 16 passed the loose threshold) (Fig. 2A).

Figure 2
figure 2

Performance of the calibrated protocol on different datasets (see Supplementary Table S2). (A) Correlation: Predicted vs. measured activities on D-TRAINING (dots, blue: substrates, red: non-substrates); D-CAPPED (yellow triangles); and D-TEST (grey squares) datasets. (B) Binary distinction: ROC curves on D-TRAINING (magenta), D-EXTENDED (cyan) and D-HPLC (brown). (C) Performance of protocol on D-HPLC, a dataset measured using HPLC (from Kutil et al.3). In (A) and (C), a dashed horizontal line denotes the cutoff dividing measured substrates and non-substrates (kcat/KM=104 M−1 s−1). Note the different scales in (A) and (C). The cutoffs dividing predicted substrates from not substrates are indicated as dotted vertical line for the loose cutoff (reweighted score: − 1105) and a dotted-dashed vertical line for the strict cutoff (reweighted score: − 1118).

As independent validation, we applied our protocol to a set of 22 peptides with published measured activities (using HPLC3; the D-HPLC dataset, see Supplementary Table S1). Using the same activity cutoff (kcat/KM = 104 M−1 s−1), this set is composed of 17 substrates, three non-substrates, and four additional non-substrates with borderline activities (i.e., 104 M−1 s−1 > kcat/KM > 8 × 103 M−1 s−1)3. Our prediction identified 9 out of 16 substrates using the strict cutoff, and only one was missed by the loose cutoff. The three non-substrates were separated from the rest, lying near the loose cutoff, while the additional four non-substrates with borderline activities showed a range of predicted activities, two passing the stringent threshold and thus predicted to be substrates (i.e., kcat/KM = 8.8 × 103 M−1 s−1 and kcat/KM = 8.3 × 103 M−1 s−1) (Fig. 2C).

To assess if performance was affected by the specific crystal structure selected we repeated the analysis using different structures. The DD2 domain of HDAC6 was solved several times, bound to different ligands: (1) a cyclic peptide (PDB ID: 6WSJ30) (2) a tripeptide substrate attached to coumarin (PDB ID: 5EFN4), and (3) an inhibitor (PDB ID: 7JOM33). Best performance was achieved with the cyclic peptide (6WSJ), while the structure with the inhibitory small molecule performed worst (Table 2B and Supplementary Fig. S1).

Predictions on the human acetylome

To detect new potential HDAC6 substrates, we used our calibrated protocol (refmin_thread applied to PDB ID 6WSJ) to screen the human acetylome (from PhosphoSitePlus34, focusing on peptides annotated from low-throughput experiments). This screen detected 74 peptides that scored better than the peptide with the highest activity in the D-TRAINING set (EGKFVR, reweighted score: − 1123, blue line in Fig. 3A), and 215 and 859 peptides (out of 1030) were classified as substrates by our strict and loose cutoffs, respectively (belonging to 144 and 297 proteins) (Fig. 3). In comparison to our previous study on HDAC8 specificity12, many more substrates were suggested for HDAC6 (21%) than for HDAC8 (11%) out of the same dataset, which agrees with the reported increased promiscuity observed for HDAC6.

Figure 3
figure 3

Application of the calibrated protocol to the acetylome to detect novel potential HDAC6 substrates. (A) Blue dots and yellow dots show the distribution of scores for acetylated peptides in the D-TRAINING set and the acetylome peptides (as annotated in the PhosphoSitePlus database), respectively. Boxplots are drawn between the 1st and 3rd quartiles of the data and whiskers extend by 1.5 times the interquartile range. (B, C) Sequence logos of (B) substrates from the D-EXTENDED set, and (C) top 100 peptides predicted by our protocol.

Comparison of the sequence logo created from these peptides found in our acetylome screen to the logo created from the D-EXTENDED dataset (Fig. 3) reveal that we did not simply recreate the sequence specificity of that dataset. Although there are some similarities, such as the preference for glycine at position P−1 and for tyrosine and phenylalanine at P+1, the sequence logo is different from the original database and shows more variability. This suggests that this protocol could identify a broader range of substrates.

Our screen of the known acetylome was able to recapitulate several previously identified substrates, including K49 of β-catenin35 (reweighted score: − 1129), K274 of the microtubule-associated protein tau 3336 (reweighted score: − 1120), and K118 of ATP-dependent RNA helicase DDX3X37 (reweighted score: − 1107). Moreover, we identified 27 proteins that have been previously reported to interact with HDAC6 (based on BioGRID38). For these, the interaction could be regulated further by the enzymatic removal of acetylation. In addition to these known substrates and protein interactors, we report several novel hits among the top scored peptides (“Supplementary Data S1”).

Validating acetylome predictions

From the predicted substrates of the acetylome, we selected ten peptides (D-TEST) for additional validation, using our acetate assay. Out of the ten peptides, nine were indeed measured to be substrates (Table 3), albeit with poor correlation (⍴ = − 0.49 for the correctly identified substrates, Fig. 2). To summarize, our protocol shows robust ability to identify HDAC6 substrates, but due to modest correlation, it’s ability to predict actual substrate strength is limited.

Table 3 Validation of predicted substrates of the acetylome (D-TEST dataset). The peptides are sorted according to their experimentally measured HDAC6 substrate efficiencies.

Comparison with previous studies of HDAC6 shows little agreement on substrate selectivity

Several above-mentioned studies have previously probed the substrate landscape to HDAC6. The overlap between substrates at the protein level identified with these different experimental assays was minimal3. To further examine these differences, we compared the substrates of these high-throughput methods to the substrates predicted using our protocol.

A plot of hexamer peptides shared between D-SILAC and D-13MER shows poor agreement between the substrate sets of these two studies (Fig. 4). Consequently, while sequence logos for the respective substrate sets highlight the enrichment of certain residues when compared to a proteome-level (Fig. 4B–D), or to the non-substrate sets background (Fig. 4E–G), they also show significant differences. We used the computed PSSMs (Supplementary Fig. S2) to cross-score the datasets (using different peptide lengths) and calculated the correlations between the experimental values and PSSM scores (Supplementary Fig. S3). The strongest, albeit still mediocre correlation (⍴ = 0.47) was found between the D-3MER dataset scored with the PSSM derived from itself, and the experimental values of the D-TRAINING dataset scored by the PSSM of the D-13MER or D-SILAC datasets (⍴ = 0.46 and 0.44, respectively). This might indicate that different experiments capture different aspects of selectivity and have different biases and limitations (see “Discussion”). It also means that fitting a computational model considerably better than these experimental agreements would be prone to overfitting.

Figure 4
figure 4

Different datasets of experimentally determined HDAC6 substrates show minor overlap and agreement. Comparison of substrate specificities of different experimental datasets, based on their sequence logos and correlation of substrate activities. (A) Plot of substrate activities for peptides shared between the D-13MER3 and D-SILAC14 datasets. (blue, red: predicted substrates and non-substrates by our protocol; triangles indicate peptides with H/L ratio > 3 in the SILAC experiment, or measured intensity > 2 in the peptide array, for which the values have been capped to fit into the plot; dashed lines indicate cutoffs for defining substrates and non-substrates; S, NS: substrates and non-substrates, respectively) (BF) Sequence logos made with PSSMSearch89 using its default background containing human sequences (BD), and with Two Sample Logo90 using the non-substrate peptides as background for each dataset (EG). (B, E) D-13MER, (C, F) D-SILAC, and (D, G) D-3MER datasets.

Discussion

In this study we have calibrated a structure-based protocol to characterize HDAC6 substrates, and applied this protocol to identify new potential deacetylation substrates. We trained the method on a set of selected peptides for which we measured catalytic activity, validated the model on a set of independently measured peptides and applied it to the human acetylome. In the following discussion we compare the selectivity determinants of HDAC6 and HDAC8, summarize potential roles for the newly predicted substrates, and point to challenges in the accurate and comprehensive characterization of a promiscuous enzyme such as HDAC6.

Substrate selectivity for an enzyme is determined by both the value of kcat/KM and the local concentration of the substrate. In cells, HDAC6 is ubiquitously expressed in almost all cell types (based on the Human Protein Atlas39) and localized mainly in the cytosol, in contrast to other HDACs40,41,42. HDAC8 has a similar expression profile, but shuttles between the cytosol and the nucleus43,44,45, allowing potentially for access to many more substrates. HDAC6 has a much higher catalytic efficiency and increased promiscuity towards peptide substrates in vitro compared to that of HDAC8 reported previously. For example, the fastest measured peptide substrate of HDAC8, ZNF318 at K1275, has a kcat/KM value of 4.8 × 103 M−1 s−112 (over 40-fold lower than the fastest HDAC6 substrate; it would not be considered a substrate for HDAC6). It is known that the activity of HDAC8 towards peptides is much lower than its activity towards full-length protein substrates46, in contrast to HDAC6 DD2 activity measured in this study, that shows similar activity of peptides and full-length protein substrates (Supplementary Table S2): where HDAC8 sees an increase in catalytic efficiency of about 100-fold, HDAC6 displays catalytic efficiencies within a 2-fold range. Similarity between activity towards short peptides and full-length protein substrates indicates that HDAC6 substrate preference is determined through short range interactions between the active site and the substrate, distal protein–protein contacts do not enhance activity, and that peptides are a suitable analog for full-length substrate activity. We note that the increased catalytic efficiency is linked to the low KM values of most of the peptides. To develop our protocol, we determined it was necessary to obtain accurate measurements of catalytic efficiency, as the low KM values easily allow the kcat/KM value to be underestimated using measurements with a single substrate concentration.

Even with the assurance of accurate activity measurements, predicting the selectivity of HDAC6 proved to be much more challenging than its paralog HDAC8 for which we were able to obtain good predictions without introducing any backbone receptor flexibility12. We explored structural differences between these proteins that could explain this finding. Comparison of HDAC6 DD2 (PDB ID: 6WSJ30) and HDAC8 (PDB ID: 2V5W47) structures (Fig. 5) highlight as main difference a loop involved in forming the binding pocket that could lead to differences in binding selectivity. The residue that forms a hydrogen bond with the acetylated lysine to position the substrate, D101 in HDAC8 and S531 in HDAC6, contacts the backbone at a similar position, but stems from a very different loop backbone. The loop participates in the formation of the pocket accommodating the residue preceding the acetylated lysine (P−1). Indeed, in HDAC6 this pocket is considerably smaller, explaining the significant enrichment for glycine in the peptide libraries. This loop in HDAC8 also harbors Y100, a residue whose hydroxyl group forms a hydrogen bond with the peptide backbone in HDAC8, providing an additional recognition feature. No residue corresponding to Y100 is found in HDAC6. The different loop conformation also allows for the peptide residues after the acetylated lysine to make larger movements, and HDAC6 to accommodate more substrates. An additional loop located near the pocket that accommodates the residues trailing the acetylated lysine (P+1–3) is significantly longer (residues 456–467 in HDAC6 compared to seven residues in HDAC8) and more hydrophobic in HDAC6. The larger size of the loop, together with its greater hydrophobicity, suggest that the HDAC6 loop may be more flexible, allowing for adaptations of the binding groove, resulting in a more promiscuous binding pattern. Indeed, we showed here that receptor backbone minimization, that moves this loop, is needed for our protocol to succeed, suggesting that this pocket is restructured for binding.

Figure 5
figure 5

Key differences between HDAC6 and HDAC8 in the coordination of the substrate. The active sites of HDAC6 (receptor from PDB ID 6WSJ, substrate from 5EFN for visualization purposes, orange) and HDAC8 (PDB ID 2V5W, teal) are overlaid. The loops that distinguish the two HDACs are highlighted (orange and teal), together with conformations generated for modeling substrate activity (grey). Residues S531 (HDAC6) and D101 (HDAC8) that coordinate hydrogen bonding (light blue: HDAC6, yellow: HDAC8) to the backbone of the acetylated substrate lysine residue, are also highlighted, as well as positions W459 (HDAC6) and Y100 (HDAC8). The hydroxamic acid inhibitors are shown in sticks, and the catalytic Zn2+ ions are shown as a white sphere. See Text for more details.

Previous attempts for modeling the interactions between the measured peptides only gave mediocre results, although multiple starting structures from PDB were evaluated (e.g. 5EFN, 5EFK, 5EDU, data not shown). These structures were either bound to small molecule inhibitors or tripeptide substrates covalently linked to 7-amino-4-methylcoumarin. However, with the release of 6WSJ, HDAC6 DD2 bound to a cyclic peptide, the predictions improved (see Supplementary Fig. S1 and Table 2B). This highlights the importance of choosing the right starting structure, due to the sensitivity of the approach to slight structural variations.

Proteins harboring peptides in the D-TRAINING set for which HDAC6 displayed the greatest deacetylase activity (> 4 × 104 M−1 s−1) play a structural role or are associated with cellular structural elements (LMNA48, MYO1G49, ACTN150, TUBA1A5,51,52), or have chaperone functions (GRP9453, HSP907,8), as previously reported. Beyond these, our substrate prediction model suggests novel aspects of HDAC6 deacetylase function and impact.

Acetylation and ubiquitination reportedly compete for lysine residues; acetylation can prevent lysine ubiquitination54,55 or ubiquitin chain elongation56 and consequently protect against proteasomal degradation. We inspected the acetylated lysines in our predicted substrates, as well as their flanking regions, for additional reported post-translational modifications. Among the 215 candidates that passed the strict score cutoff of − 1118, 93 (43%) underwent ubiquitination (data from PhosphoSitePlus57, see “Supplementary Data S1”). Furthermore, deacetylation of some of these sites has already been linked to promoting protein ubiquitination and degradation: K569 and K259 of Forkhead box protein O3 (FOXO358), K709 of Hypoxia inducible factor 1 alpha (HIF1A59), K887 and K1413 of Werner syndrome ATP-dependent helicase (WRN60), K406 of Chorion-specific transcription factor GCMa (GCM161), and K540 of ATP-citrate synthase (ACLY62). Additional exploration of the role of HDAC6, deacetylation, and ubiquitination could illuminate important regulation of the involved biological pathways.

The success of the substrate prediction model is shown in its ability to reinforce previously identified HDAC6 substrates as well as allow novel HDAC6 functions to be explored. The two measured acetylome peptides with the best HDAC6-catalyzed deacetylation values (kcat/KM=4 × 104 M−1 s−1), are derived from the proteins of EGFR and TARDBP. TARDBP has been previously explored as a direct substrate of HDAC6: acetylated TARDBP aggregates are found in patients with amyotrophic lateral sclerosis (ALS)63 and deacetylation of TARDBP by HDAC6 prevents TARDBP aggregation63. HDAC6 is known to regulate EGFR endocytic trafficking between the apical and basolateral membranes by regulating deacetylation of ɑ-tubulin and affecting its turnover rate64,65. Acetylation of EGFR has been shown to enhance its activity, an effect observed through treatment with HDAC inhibitors66. Two high-scoring peptides harboring residue K1179 and K1188 (reweighted scores: − 1120 and − 1122, respectively) are derived from EGFR. We hypothesize that HDAC6 could play a role in the regulation of the protein level and activity not just by catalyzing deacetylation of EGFR interactors (i.e., ɑ-tubulin), but also by directly deacetylating EGFR itself.

Related to the above, Reactome Pathway analysis67 of the proteins belonging to the top 100 best scoring peptides identified pathways of Estrogen-dependent gene expression (R-HSA-9018519, False Discovery Rate (FDR = 0.01) and Transcriptional regulation of granulopoiesis (R-HSA-9616222, FDR = 0.03) as significantly overrepresented compared to the background of the human proteome (FDR < 0.05). However, the latter, although significant, was also enriched when the whole acetylome was compared to the background. Several studies have shown a link between HDAC6 and estrogen-signaling68,69, with estrogen upregulating the level of HDAC668,69, and we also identified the estrogen receptor as a possible substrate (K171, reweighted score: − 1119). One of our strongest predicted substrate peptides harbors K87 on nucleolin (NCL, reweighted score: − 1130) which is predicted to be part of a cyclin-dependent kinase (CDK) phosphorylation motif (MOD_CDK_SPxK_1 for K87 or MOD_CDK_SPxxK_3 for K88)16, where the charge of the lysine is crucial for recognition70. Therefore, phosphorylation would probably be hindered by acetylation of the lysine residue. Phosphorylation of T84 in NCL was shown to be crucial for its interaction with DEAD box polypeptide 31 (DDX31)71. This complex is important for activating EGFR/Akt signaling, which induces cell survival and proliferation72 and Akt was shown to be hypo-phosphorylated in HDAC6 knockout mice73. This could provide further explanation for the role of HDAC6 in tumorigenesis and tumor survival in breast cancer65,73, pointing to a multi-level regulation of these processes.

The acetylome screen also revealed numerous predicted substrates involved in metabolism with important links to cancer development. Our second best scoring substrate, K311 of Glutaminase (reweighted score: − 1134, GLS), plays an important part in regulating GLS oligomerization which is crucial for the activation of the enzyme74. Removal of the acetyl group induces oligomerization and activation of GLS which reduces oxidative stress in cancer cells therefore promoting their survival75. Glycolytic enzymes have been proposed to be substrates of HDAC676, and our acetylome screen identified two glycolytic enzymes shown to interact with HDAC6, phosphoglycerate kinase 1 (PGK1)77 and pyruvate kinase muscle (PKM)78. The acetylome screen identified additional metabolic enzymes, including heme oxygenase 1 (HMOX1)79, ATP-citrate lyase (ACLY)62, and malic enzyme 1 (ME1)80. The acetylation of such enzymes has been connected to regulation of enzymatic activity leading to carcinogenesis. Our results indicate the involvement of HDAC6 in cancer may be more related to metabolic pathways than previously proposed.

Several proteins identified in the acetylome screen are involved in the cellular response to oxidative stress. These proteins are transcription factors which function through intracellular localization changes in response to cellular signaling, particularly nuclear factor erythroid 2-related factor 2 (NRF2), HIF1A, FOXO3, and Forkhead box protein O1 (FOXO1). The acetylation of transcription factor FOXO1 has been connected to the protein residing in the cytoplasm, and deacetylation results in nuclear importation/retention to allow for DNA binding and transcriptional activation81. The acetylome screen identified several of FOXO1 acetylation sites as potential HDAC6 substrates.

Some of the acetylated sites (K1699, K1550, K1769, K1546) in histone acetyltransferase p300 (EP300) are predicted to be putative substrates. Although the deacetylation of EP300 was previously linked to several other HDACs54,55, these data suggest that HDAC6 could also be a potential regulator of this protein. This reveals an intriguing possibility of mutual crosstalk via PTMs, since EP300 has been shown to acetylate HDAC6 and thereby modify the ability of HDAC6 to deacetylate tubulin56.

In this study we develop a structure-based model of HDAC6 substrate selectivity, and successfully validate it experimentally on a number of substrates. Nevertheless, comparison to other published studies shows limited agreement of detected HDAC6 substrates among any of these studies (Fig. 4). This disagreement could be due to different experimental setups. One potential difference is using only the DD2 domain of HDAC6 instead of the full-length protein. However, the studies discussed here (D-3MER13, D-SILAC3 and D-13MER14 datasets) all used the full-length enzyme and still report little overlap (e.g., only six common substrates among the 52 and 27 substrates identified in the D-13MER and D-SILAC sets, respectively). Another source of variation could stem from the varied size of the peptides and their flanking regions in the different experiments. Among the peptides measured both in the D-13MER and D-SILAC datasets, there were 171 and 89, respectively, whose core hexamers were the same, but their flanking regions differed. Nevertheless, most were either substrates or non-substrates irrespective of the different flanking regions (Supplementary Fig. S4). Closer inspection suggests that most of these variations could be explained by experimental measurement variations (i.e., outliers among the repeated experiments in case of D-13MER and protein inference differences for the D-SILAC dataset. As for the latter, assigning peptides with overlapping sequences to different proteins affects their quantification). It therefore does not come as a surprise that PSSMs generated based on the different datasets show low concordance (Fig. 4 and Supplementary Fig. S2).

None of these PSSM matrices could be used to generate predictions that correlate strongly with experimentally measured values (best correlation R = 0.47, see Supplementary Fig. S3). Of note, PSSMs treat peptide positions independently and do not incorporate additional information, e.g. secondary structure and disorder, thus may not be able to fully encompass substrate selectivity determinants. Overall this suggests that due to possible experimental biases, the different experiments only capture part of the specificity, and consequently would also explain our suboptimal performance on these datasets.

In vivo experiments evaluate HDAC6 substrate selectivity in its physiological context, including potential cofactors, binding proteins and post-translational modifications of both the substrates and HDAC6. For example, HDAC6 displays different affinities toward tubulin dimers compared to assembled microtubules82. In in vitro experiments, peptides outside of their native environment might not fold into their native secondary structure that may be important for recognition83. Moreover, in vivo experiments can also detect downstream effects of treatments, therefore introducing bias into the results. Additionally, as with all high throughput experiments, they have a higher chance of amplifying noise. Furthermore, antibodies widely used to detect or enrich for acetylation might not be sensitive or specific enough or have a bias for certain amino acids around the modification site84, and mass spectrometry is biased to capture peptides where the positive charge is not removed by post-translational modifications such as acetylation85.

The present study provides for the first time accurately measured HDAC6 enzymatic activities on a large set of unlabeled peptides. The prediction model developed here allowed for the discovery of novel substrates and novel avenues of substrate exploration for HDAC6. The construction of a structure-based model for a second HDAC isozyme highlights the utility and limitations of this approach for predicting novel substrates for isozymes with very different substrate ranges. The promiscuity of HDAC6 is particularly apparent in our data, and the limited agreement amongst the different large-scale substrate detection experiments suggest additional cellular regulatory mechanisms that confer selectivity. Our prediction model illuminated possible structural determinants of the broad selectivity of HDAC6 in comparison to HDAC8 and suggested novel substrates and regulatory roles for HDAC6.

Materials and methods

Measuring enzyme activity

Reagents

High flow amylose resin was purchased from New England Biolabs and Ni–NTA agarose was purchased from Qiagen. Adenosine triphosphate (ATP), coenzyme A (CoA), NAD+, NADH, l-malic acid, malate dehydrogenase (MDH), citrate synthase (CS), and mouse monoclonal anti-polyhistidine-alkaline phosphatase antibody were purchased from Sigma. Monoacetylated peptides, with N-terminal acetylation and C-terminal amidation, were purchased from Peptide 2.0 or Synthetic Biomolecules. 3% (v/v) acetic acid standard was purchased from RICCA Chemical. All other materials were purchased from Fisher at > 95% purity unless noted otherwise.

HDAC6 expression and purification

The plasmid and protocol for the expression and purification of Danio rerio HDAC6 catalytic domain 2 (DD2) was generously provided by David Christianson (University of Pennsylvania). The expression construct was prepared previously by the Christianson lab by cloning the residues 440–798 of the Danio rerio HDAC6 gene into a modified pET28a(+) vector in frame with a TEV-protease cleavable N-terminal 6xHis-maltose binding protein (MBP) tag4. HDAC6 was expressed and purified as described with several alterations for expression optimization4. BL21(DE3) E. coli cells (Novagen 69450-3) were transformed with plasmid according to the protocol and plated on LB media-agar supplemented with 50 μg/mL kanamycin. Plates were incubated overnight at 37 °C (16–18 h), and a single colony was added to a LB media starter culture supplemented with 50 μg/mL kanamycin and incubated with shaking at 37 °C for 16–18 h. This overnight starter culture was diluted (1:200) into 2x-YT media supplemented with 50 μg/mL kanamycin and incubated at 37 °C with shaking until the cell density reached an OD600 = 1. The cultures were then cooled to 18 °C for one hour and supplemented with 100 μM ZnSO4 and 500 μM isopropyl β-d-1-thiogalactopyranoside (IPTG) to induce expression. The cultures were grown for an additional 16–18 h with shaking at 18 °C and harvested by centrifugation at 6000×g for 15 min at 4 °C. Cell pellets were stored at − 80 °C. 1-mL pre- and post-induction samples were taken and tested for HDAC6 expression by polyhistidine western blot and activity using the commercial Fluor de Lys assay (Enzo Life Sciences).

Cell pellets were resuspended in running buffer (50 mM HEPES, pH 7.5, 300 mM KCl, 10% (v/v) glycerol and 1 mM TCEP) supplemented with protease inhibitor tablets (Pierce) at 2 mL/g cell pellet. The cells were lysed by three passages through a chilled microfluidizer (Microfluidics) and centrifuged for 1 h at 26,000×g at 4 °C. Using an AKTA Pure FPLC (GE) running at 2 mL/min, the cleared lysate was loaded onto a 10-mL packed Ni–NTA column equilibrated with running buffer. The column was washed with 10 column volumes (CVs) of running buffer and 10 CVs of running buffer containing 30 mM imidazole, and the protein was eluted with 5 CVs elution buffer containing 500 mM imidazole. 8 mL fractions were collected and analyzed by SDS-PAGE and western blot, and fractions containing His-tagged HDAC6 were combined and loaded onto a 30-mL amylose column equilibrated with running buffer at 1 mL/min. The column was washed with 2 CVs running buffer and the protein was eluted with 5 CVs of running buffer supplemented with 20 mM maltose. Fractions containing HDAC6 were combined with His6x-TEV S219V protease (0.5 mg TEV protease/L culture), previously purified in-house86 using a commercially purchased plasmid (Addgene plasmid pRK739), and dialyzed in 20 K molecular weight cut-off (MWCO) dialysis cassettes against 200-fold running buffer containing 20 mM imidazole overnight at 4 °C. After dialysis, the sample was loaded onto a 10-mL Ni–NTA column pre-equilibrated with running buffer containing 50 mM imidazole at 2 mL/min. The column was washed with 5 CVs of 50 mM imidazole running buffer to elute cleaved HDAC. Non-cleaved HDAC6 and His-tagged TEV-protease were eluted with 20 CVs of a 50–500 mM linear imidazole gradient. Fractions containing cleaved HDAC6 were combined, concentrated to < 2 mL, and loaded onto a 26/60 Sephacryl S200 size exclusion chromatography (SEC) column (GE) equilibrated with SEC/storage buffer (50 mM HEPES, pH 7.5, 100 mM KCl, 5% glycerol, and 1 mM TCEP) at 0.5 mL/min. Eluted peaks were tested for deacetylase activity, and active fractions were concentrated, aliquoted, flash frozen with liquid nitrogen, and stored at − 80 °C.

Acetyl-CoA synthetase (ACS) expression and purification

The pHD4-ACS-TEV-His6x expression vector was prepared previously by inserting the ACS gene from a chitin-tagged acetyl-CoA synthetase plasmid Acs/pTYB1, a generous gift from Andrew Gulick (Hauptman-Woodward Institute), into a pET vector containing a His6x tag to increase expression12,27. The pHD4-ACS-TEV-His6x construct was expressed and purified as previously described27.

Coupled acetate detection assay

The coupled acetate detection assay or simply the ‘acetate assay’ was performed as previously described with a few modifications27. Briefly, lyophilized peptides were re-suspended in water when possible or with minimal quantities of acid, base, or organic solvent to improve solubility. Peptide concentration was determined by one or more of the following methods: (1) measuring A280 using the extinction coefficients if the peptide contained a tryptophan or tyrosine, using the fluorescamine assay if the peptide contained a free lysine87, (2) performing the bicinchoninic acid (BCA) assay using bovine serum albumin (BSA) as a standard, and (3) determining the concentration of acetate produced by complete deacetylation of the peptide by HDAC6.

Reactions containing 10–2000 μM monoacetylated peptides in 1× HDAC6 assay buffer (50 mM HEPES, pH 8.0, 137 mM NaCl, 2.7 mM KCl, 1 mM MgCl2) were initiated with 0.1–1 μM HDAC6 at 30 °C. Timepoints, 60 μL, were quenched with 5 μL of 10% hydrochloric acid and kept on ice until assay completion (no more than 90 min). Timepoints were flash frozen with liquid nitrogen and stored at − 80 °C until work-up.

Coupled solution (50 mM HEPES, pH 8, 400 μM ATP, 10 μM NAD+, 30 μM CoA, 0.07 U/μL CS, 0.04 U/μL MDH, 50 μM ACS, 100 mM NaCl, 3 mM KCl, 50 mM MgCl2, and 2.5 mM l-malic acid) was prepared the day of the work-up and incubated at room temperature away from light for at least 25 min. Timepoints were quickly thawed and neutralized with 15 μL of freshly prepared and filtered 6% sodium bicarbonate. Neutralized timepoints or acetate or NADH standards, 60 μL, were added to 10 μL coupled solution (or 1× assay buffer for NADH standards) in a black, flat-bottomed, half-area, non-binding, 96-well plate (Corning No. 3686). The resulting NADH fluorescence (Ex = 340 nm, Em = 460 nm) of standards and timepoints was read on a PolarStar fluorescence plate reader at 1–3-min increments until the signal reached equilibrium. The acetate and NADH standard curves were compared to verify the coupled solution’s activity. When possible, a positive control reaction for enzyme activity was included. Using the acetate standard curve, the fluorescence of each timepoint was converted to μM product, and the slopes of the linear portion of the reaction (< 10%) were plotted against substrate concentration. Using GraphPad Prism, the Michaelis–Menten equation (Eq. 1) was fit to the resulting dependence of the initial velocity on substrate concentration to determine the kinetic parameters kcat/KM, kcat, and KM. Standard error was calculated using GraphPad Prism analysis. A total of 50 peptides were tested. Of those, 26 were used in the training set (D-TRAINING) and another 16 were used during validation of the protocol (D-CAPPED) (Table 1, Supplementary Table S1). 8 of the peptides were longer than six residues or their activity could not be measured precisely, therefore we did not use them for further analysis.

$$\frac{{v}_{0}}{\left[E\right]}={\frac{{k}_{cat}}{{K}_{M}}}*\frac{\left[S\right]}{\left(\frac{\left[S\right]}{{K}_{M}}+1\right)}$$
(1)

Calibration of FlexPepBind

The protocol implemented in this study is similar to the one used in our previous study on HDAC8 specificity12, and in the following we mainly highlight the differences. In every Rosetta protocol described, we used Rosetta v2020.28. See Supplementary Tables S6 and S7 for command-line files and arguments for running FlexPepDock.

Running FlexPepBind requires the creation of a starting structure to generate a template (or a set of templates, as in the present study) for threading peptides. We used the structure of the Danio rerio HDAC6 catalytic domain 2 (DD2) which was crystallized in a complex with a cyclic peptide substrate (PDB ID: 6WSJ30). To enforce a catalysis-competent binding conformation, we defined constraints that characterize substrate binding as revealed by the solved structures of HDAC6 bound to ligands. Constraints were defined for Rosetta runs as with HDAC812. These include: (1) interactions coordinating the proper binding of the Zn2+ ion required for enzymatic activity, (2) interactions between the acetylated lysine side chain and the binding pocket, and (3) a dihedral angle constraint in the peptide between residues 3 and 4 (i.e., adjacent to the acetylated lysine in the modeled hexamers) to enforce a cis-peptide bond (see Supplementary Table S6 for further details). All of the distances between interacting residues were measured on structure PDB ID 6WSJ. For comparing the final protocol on structures 5EFN, 6WSJ and 7JOM, the constraints were measured on the respective structures.

The best substrate peptide (sequence: EGKAcFVR) was built into the binding pocket of 5EFN, using the trimer substrates and corresponding atoms of coumarin, with the trailing three residues of the peptide being added in extended conformation. Then, the resulting complex was superimposed onto 6WSJ and the peptide conformation copied into its binding pocket. FlexPepDock was run on this structure with the constraints added, generating nstruct = 250 decoys with different setups (Table 2), both with and without receptor backbone minimization (refmin and ref protocols, respectively). The scoring function of ref201588 was used and reweighted score (reweighted_sc) was found to best discriminate substrates (compared to total score and interface score). Reweighted score is calculated by summing Rosetta’s total score, interface score (I_sc) and peptide score (), giving double weight to interface residues and triple weight to the residues of the peptide. In contrast to the HDAC8 study, we selected not only the top-scoring structure, but rather the top 5 structures, according to reweighted score. Every peptide of the training dataset was threaded onto these starting structures using the Rosetta fixbb protocol and running FlexPepDock with minimization only (see Supplementary Tables S6 and S7 for runline commands and parameters). Again, these simulations were run with or without receptor backbone minimization (threadmin and thread, respectively), resulting in four different potential protocols: refmin_threadmin, refmin_thread, ref_threadmin, and ref_thread. For each peptide sequence, the best reweighted score among the five templates was used to reflect its substrate strength.

Comparison of datasets

The PSSMs for the comparison of different datasets were generated with PSSMSearch89, using PSI-BLAST as the scoring method. Substrates were defined according to the thresholds applied in the reported studies, except for the D-3MER dataset, where no such threshold was provided. The substrates for this dataset were taken from the first five out of a total of 15 bins. The differential sequence logos were generated with the Two Sample Logo standalone program90, using default values. Correlation and AUC calculations and visualizations were created in R (v3.5.1), using corrplot (version 0.84) and pROC (v1.16.291) packages, respectively.

For the calculation of the scores using the PSSM of the D-3MER dataset, we used only the leading two amino acids before the acetylated lysine in the substrate. We also scored the peptides by only using values from D-SILAC and D-13MER PSSMs in P-1 and P-2 positions. In summary, every 3-mer peptide has three different scores and every 13-mer peptide has 7 different scores (4 for the full-length peptide based on D-SILAC, and D-13MER with 13 and 6 scored amino acids; and 3 for assessing the core 3-mers).

Common peptides from the D-13MER and D-SILAC dataset were selected for comparison of the two approaches. For defining substrates and non-substrates, we used the thresholds of 0.8–1.2 for H/L ratio in the D-SILAC dataset, and maximum 1 replica in which the peptide was identified as substrate for the D-13MER dataset. For substrates, the enrichment threshold was 2 and above and 3 experiments in which a peptide was identified, for the D-SILAC and D-13MER sets, respectively. We note the significant difference in the number of non-substrates in the training and test sets, which preclude a rigorous comparison of performance on these sets.

Running the calibrated protocol on the acetylome

The dataset of the human acetylome was extracted from the PhosphoSitePlus database (downloaded on 23/09/202034). Hexamer peptides around sites on human proteins with at least one low-throughput experiment to support were derived with two leading and three trailing residues around the modification site. Only peptides spanning a full hexamer (i.e., the modification is not at the termini) were selected. BioGrid data38 for HDAC6 was downloaded on 11/09/2020 from database version 4.1.190. For pathway analysis, we used Reactome with Pathway Browser version 3.7 and database release 74. Motif prediction was done using the Eukaryotic Linear Motif resource16.