Protein functional constraints are manifest as superfamily and functional-subgroup conserved residues, and as pairwise correlations. Deep Analysis of Residue Constraints (DARC) aids the visualization of these constraints, characterizes how they correlate with each other and with structure, and estimates statistical significance. This can identify determinants of protein functional specificity, as we illustrate for bacterial DNA clamp loader ATPases. These load ring-shaped sliding clamps onto DNA to keep polymerase attached during replication and contain one δ, three γ, and one δ’ AAA+ subunits semi-circularly arranged in the order δ-γ1-γ2-γ3-δ’. Only γ is active, though both γ and δ’ functionally influence an adjacent γ subunit. DARC identifies, as functionally-congruent features linking allosterically the ATP, DNA, and clamp binding sites: residues distinctive of γ and of γ/δ’ that mutually interact in trans, centered on the catalytic base; several γ/δ’-residues and six γ/δ’-covariant residue pairs within the DNA binding N-termini of helices α2 and α3; and γ/δ’-residues associated with the α2 C-terminus and the clamp-binding loop. Most notable is a trans-acting γ/δ’ hydroxyl group that 99% of other AAA+ proteins lack. Mutation of this hydroxyl to a methyl group impedes clamp binding and opening, DNA binding, and ATP hydrolysis—implying a remarkably clamp-loader-specific function.
An important question in biology is which sequence and structural features enable proteins sharing a common catalytic core to perform entirely different functions. Consider, for example, AAA+ ATPases, which mediate a wide variety of cellular activities, including membrane fusion, DNA replication, microtubule dynamics, intracellular transport, transcriptional activation, protein refolding or degradation, and the disassembly of protein complexes1,2. These form homomeric or heteromeric complexes consisting of from five to seven AAA+ modules with ATP-binding sites typically interacting with an adjacent module. Each complex channels the energy of ATP hydrolysis into coordinated conformational changes specific to its function. Although we cannot directly observe the biochemical mechanisms mediating these processes, given enough sequence data we can infer mechanistically imposed constraints. The nature of these constraints varies. They may appear as residues conserved in an entire superfamily or in functionally related protein subgroups (i.e., as correlations between sequence patterns and biochemical properties), as subtle pairwise correlations, or as correlations among these sequence features or with structural features.
Previously investigated protein constraints include function determining residues (FDRs), “coevolving sectors”, directly coupled (DC) residue pairs, and subgroup-specific patterns. FDR methods3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 generally focus on predicting specific, well-characterized residue functions, such as in substrate recognition and catalysis, that can be benchmarked experimentally25. However, due to the incompleteness of experimental annotations, we lack reliable gold standards for important but as yet undiscovered residue functions26,27, which, along with objectively characterizing protein constraints, is the focus of our investigation here. One may uncover new residue functions by applying statistical methods—which most FDR approaches lack—to distinguish signal from noise within large data sets. Methods that focus on evolutionary changes within a phylogenetic tree28,29 make no presuppositions concerning residue functions, but cannot adequately analyze large numbers of sequences due to the need to construct a tree, which, for large data sets, introduces more complexity than either is necessary or can be reliably inferred.
Statistical Coupling Analysis (SCA)30 applies principal component analysis to a multiple sequence alignment (MSA) covariance matrix to identify groups of “coevolving protein sectors”31 that are believed to arise from selection acting upon protein functional properties32. SCA has been used to predict hydrophobic cavities33 and surface sites34 involved in allosteric regulation and to design proteins35. However, most published SCA studies identify a single sector, for which, it has been suggested36, statistically equivalent predictions may be made using sequence conservation alone. If so, then SCA may be most useful when multiple sectors are present.
Direct coupling analysis (DCA)37 is similar to SCA but uses a different algorithmic approach38 and therefore extracts different biologically relevant information39. DCA focuses on predicting contacts between residue pairs based on correlated substitution patterns among homologous proteins: In order to maintain structural integrity, substitutions at one residue position often result in compensating substitutions at other positions over evolutionary time. Hence, in principle, MSA covariance analysis can predict structural contacts. However, early approaches fell short of expectations due to the confounding effect of indirect correlations: When residues correlate both at positions i and j and at positions j and k, then residues at positions i and k may also correlate even though they fail to interact directly. DCA39,40,41,42,43,44,45,46 overcomes this problem by disentangling direct from indirect correlations. DCA employs a variety of algorithmic strategies, including sparse inverse covariance estimation41, multivariate Gaussian modeling47, and pseudo-likelihood maximum entropy optimization43,44,48; among these, the last strategy (as implemented in CCMpred48) performed best based on the estimated significance of the overlap between high DC-scores and 3D-contacts49.
Bayesian Partitioning with Pattern Selection (BPPS)50,51,52, like DCA, identifies correlations among columns in an MSA, but unlike DCA, focuses on residues co-conserved among functionally related subgroups. Using Markov chain Monte Carlo sampling, BPPS partitions an MSA into hierarchically arranged subgroups, each defined by a corresponding conserved pattern that best distinguishes that subgroup from those further up the hierarchy. The overlap is typically weak between BPPS pattern residues and either FDRs53 or high scoring DCA residue pairs49. Moreover, as illustrated here, BPPS enhances the utility of DCA by allowing characterization of direct couplings specific to a functionally-divergent subgroup. Hence, DCA and BPPS are complementary, with a combined analysis often providing deeper biological insight.
Here we describe Deep Analysis of Residue Constraints (DARC), which applies unsupervised machine learning based on both Bayesian and frequentist statistical modeling to perform a multifaceted analysis of residue constraints. It applies regularization methods to avoid over-fitting during model selection. When applied to bacterial DNA clamp loader AAA+ subunits, DARC reveals highly distinctive, biologically interpretable features, the most striking of which is a hydroxyl group that interacts in trans with an adjacent active site. Biochemical analyses reveal that this hydroxyl group is involved in key DNA clamp-loader-specific functions.
Deep analysis of residue constraints
Given a (typically very large) multiple sequence alignment (MSA) with a specified sequence as a query, DARC hierarchically partitions the MSA into one or more query-related subgroups, each defined by a pattern that most distinguishes that subgroup’s sequences from other, closely related sequences. Such patterns presumably are due to constraints (denoted as CP) imposed on residues determining the functional specificity of proteins within the query’s lineage (i.e., the query’s family, subfamily, etc.). The root of the hierarchy is defined by a pattern distinguishing the entire superfamily from unrelated proteins. DARC also performs DCA48,54 to predict structural contacts based on pairwise constraints (denoted as CDC), measured as a direct coupling (DC)-score between each pair of columns within the query family sub-alignment. DARC estimates the statistical significance (3DSDC) of the correlation between DC-scores and 3D contacts (3DCDC). Viewing direct couplings as functional constraints, 3DSDC serves as a measure of the degree to which a given 3D structure is in a functionally relevant conformation. Likewise, for the correlation between pattern residues and DC-scores (DCCP) and between pattern residues and 3D-contacts (3DCP), DARC computes statistical significance scores DCSP and 3DSP, respectively. An insignificant value for DCSP suggests that CDC and CP are complementary, so that a joint analysis may provide deeper biological insight. DARC likewise computes a statistical significance score (CLSP) for constraints (CLCP) tending to cluster pattern residues together spatially. To help identify determinants of protein functional specificity, DARC highlights within sequence alignments and available structures those residues subject to the strongest of each type of constraint. A DARC workflow diagram is shown in Fig. 1. Detailed descriptions of DARC algorithms and statistical models are provided as Supplementary Information.
Identification and alignment of AAA+ sequences
We used MAPGAPS55 with a curated hierarchical MSA of AAA+ NTPases as the query to search the NCBI January 10th, 2018 release of the NCBI nr, and the April 8, 2016 releases of the env_nr, and translated EST databases56 to obtain over a million multiply aligned AAA+ sequences. We removed fragments (i.e., those with >25% deletions) and all but one among those sharing ≥95% sequence identity, yielding an MSA of 474,040 AAA+ modules. However, all (non-fragment) sequences corresponding to known structures detected by MAPGAPS were retained in the MSA. To check for reproducibility, we repeated the search and our analysis using the February 20, 2019 release of the nr database and the same env_nr and EST databases, yielding an MSA of 533,844 AAA+ subunits.
E. coli DNA clamp loader assays
Equilibrium β clamp binding and opening assays and calculations of corresponding equilibrium constants, equilibrium DNA binding assays, steady state ATP hydrolysis assays, and thermal stability assays were performed as described in Supplementary Information.
Bacterial clamp loader determinants of functional specificity
Bacterial DNA clamps are composed of two identical β subunits of DNA polymerase III57 that encircle DNA and bind to DNA polymerase to prevent premature dissociation during replication58,59,60. The ATP-bound bacterial clamp loader binds to and opens the β clamp at a homodimeric interface and, upon association with primed DNA, undergoes ATP hydrolysis to dissociate from both the clamp and DNA, thereby loading the clamp onto DNA61,62,63. The minimal clamp loader complex is comprised of five AAA+ subunits—one δ, three γ and one δ’—arranged semi-circularly in the order δ-γ1-γ2-γ3-δ’64. Only γ is an active ATPase, though both γ and δ’ functionally influence an adjacent γ subunit. Figure 2a,b show the structural features of the γ subunit. The coordinated conformational changes required for clamp loading depend on interactions within and between these subunits and with ATP, the β clamp, and DNA61,62,63. DARC associates two sets of pattern residues with clamp loader functional specificity: Residues conserved in γ but not in δ’ (termed γ-residues) (Fig. 3a) and residues conserved in γ and δ’ but not in other AAA+ proteins (termed γ/δ’-residues) (Fig. 3b). Among the 14 structures available for the E. coli clamp loader complex, DARC assigns the highest significance overall (3DSDC + 3DSP) to the structure of the complex bound to primer DNA and an ATP analog (pdb_id: 3glf)65 (Table S1). Figure 2c–g show the locations of pattern residues and of the highest DC-scoring clamp-loader-specific residue pairs within this structure.
Both γ/δ’- and γ-residues cluster around the catalytic base
Within the DNA + ATP bound complex (pdb: 3glf), the γ/δ’-residues, six of which interact in trans with an adjacent γ active site (Fig. 2c), tend to cluster structurally around L140-γ/L129-δ’ (Fig. 2d) with high significance (CLSP: p = 6.7 × 10−10). L140-γ/L129-δ’ contacts two AAA+ catalytic residues: the catalytic base in the adjacent γ subunit (E127-γ) and a trans-acting arginine (R-)finger1 in the same subunit (R169-γ/R158-δ’). L140-γ/L129-δ’ also packs up against T165-γ/T154-δ’ (Fig. 2c), the most distinctive γ/δ’-residue (Fig. 3b), which, for the reasons given below, is termed the “γ/δ’-trans-acting threonine” (taThrγδ’) and which 99% of other AAA+ proteins lack. The γ-residues (Fig. 3a), many of which interact with γ/δ’-residues in the adjacent subunit (Fig. 2c,d), likewise cluster around the catalytic base E127-γ with high significance (CLSP: p = 1.8 × 10−9). Hence, the catalytic base is a focal point of both γ- and γ/δ’-residues. Most of the γ/δ’-residues occur within or contact (in cis) either the α4 helix, which contains both the taThrγδ’ and the R-finger, or the α2 and α3 helices, the N-terminal ends of which interact with the negatively charged phosphate backbone of DNA via their positive dipole moments65.
High DC-scoring pairs and γ/δ’-residues associated with DNA binding
Among the 20 highest DC-scoring residue pairs within domain I (Table 1), four couple adjacent regions to the α1 helix, the N-terminus of which interacts with phosphate groups of ATP and harbors the Walker A lysine residue K51; only one of the four pairs is clamp loader specific (green rod in Fig. 2d). Remarkably, the N-terminal ends of helices α2 and α3, which bind DNA, are joined together by six of the top DC-scoring pairs (magenta rods in Fig. 2d,f), at least five of which are clamp loader specific (Table 1; Fig. S2). In E. coli γ, all six pairs involve residues able to form hydrogen-bonding interactions with DNA, namely K100-γ, H134-γ, R98-γ, S132- γ, and T99-γ (pdb_id: 3glf)65. Several γ/δ’-residues occur near this end of the α2 helix, including R105-γ/R94-δ’, which can also form a hydrogen bonding interaction with DNA65. Three γ/δ’-residues within the α3 helix form inter-subunit ionic bonds with four γ-residues near the active site: E144-γ/E133-δ’ with R11-γ and R215-γ, E145-γ/E134-δ’ with R56-γ, and K141-γ/K130-δ’ with E92-γ (Fig. 2c); both R11-γ and R215-γ (the sensor 2 arginine) interact with ATP phosphate groups (Fig. 2c). Together, these interactions may allosterically link ATP binding to DNA binding or ATP hydrolysis to DNA release.
Structural features associated with clamp binding
DNA clamp loader AAA+ subunits (including eukaryotic, archaeal, and bacteriophage clamp loaders) all conserve a lysine residue (K121-γ/K110-δ’ in Fig. 2f,g) at the N-terminal end of the β3 strand66, at the other end of which is the catalytic base E127-γ. This lysine residue shows up as a γ/δ’-residue because it is absent from essentially all non-clamp loader AAA+ proteins. Its positively charged sidechain is positioned to interact with the C-terminal negative dipole moment of the DNA-binding α2 helix, which connects to the β3 strand via a loop (termed here the C2 loop) predicted to bind to the β clamp based on homology to clamp-bound structures of both eukaryotic (pdb_id: 1sxj)67 and bacteriophage DNA clamp loaders (pdb_id: 3u5z, 3u60, 3u61)68. Because the clamp loader lysine is at the C-terminal end of the C2 loop (Fig. 2f), an ionic interaction with the C-terminal end of α2 would form a bridge connecting both ends of the C2 loop—perhaps thereby forming a conformation favoring clamp binding or release. Another γ/δ’-residue D89-γ/D75-δ’ occurs at the C-terminal end of another loop (termed the C1 loop), which similarly is predicted to bind to the clamp, and which is attached to the N-terminal end of the β2-strand; β2 is structurally adjacent to β3, which harbors K121-γ. D89-γ is sequence adjacent to L90-γ, which forms a high DC-scoring pair with β3 and is positioned to form hydrogen bonds with three backbone nitrogen atoms of the C2 loop; the C2 loop is also linked to the C1 loop by another DC-pair (F87-G118-γ in Fig. 2f and H73-G107-δ’ in Fig. 2g). The C1 loop connects, via a short helix, to the zinc binding insertion characteristic of bacterial clamp loader subunits, but not other clamp loaders. A high DC-scoring pair (K60-E83-γ; Table 1, Fig. 2d) couples this short helix to the α1 helix, the N-terminal end of which interacts with ATP. The γ-residue N63-γ occurs at the C-terminal end of the α1 helix, where it may form hydrogen bonds with backbone atoms of the C2 loop (Fig. 2f), and thus may play a role in clamp binding or in (ATP-hydrolysis-coupled) clamp release. An association between N63-γ and ATP hydrolysis is suggested by the absence of this residue from many (inactive) δ’ subunits (Figs. 3, S1a), which instead more often conserve a histidine (H73-δ’) and a proline (P74-δ’) preceding D75-δ’ (Fig. 2g). Together these C1- and C2-loop-associated features may form allosteric pathways involving ATP, the β-clamp, and DNA.
Mutagenesis of the taThrγδ’ hydroxyl to a methyl group
The taThrγδ’ most distinguishes γ and δ’ from other AAA+ proteins (Fig. 3b). It is near the trans-acting R-finger in the same subunit and is about the same distance from the γ-phosphate of ATP as is the cis-acting sensor 1 threonine (T157-γ), which is involved in ATP hydrolysis in Hsp10469 and in coupling hydrolysis to restructuring of σ54-RNA polymerase in PspF70. Thus taThrγδ’ may sense the presence of the γ phosphate of ATP, modulate ATP hydrolysis, or help channel ATP binding or hydrolysis into conformational changes. Among the γ/δ’-residues, it is closest (in 3glf) to the center of the γ/δ’ cluster—that is, to L140-γ/L129-δ’, which contacts the adjacent γ-subunit catalytic base E127-γ at the center of the γ-residue cluster. In principle, the taThrγδ’ hydroxyl group could form a hydrogen bond (either directly or indirectly via a water molecule) with the catalytic base, with the R-finger, or with ATP. To investigate the possible role of this hydroxyl group, we mutated taThrγδ’ within δ’ and γ to a valine, which merely changes the hydroxyl to a methyl group—thereby avoiding the confounding conformational changes that more severe mutations might introduce. Activities were measured for each step in the clamp loading reaction for three different mutant complexes: one with three T165V-γ mutations; one with a T154V-δ’ mutation; and one with all four mutations. Here, we term these the γ-, δ’- and γ/δ’- mutants or mutations, respectively.
The trans-acting hydroxyl groups facilitate clamp binding and opening synergistically
We investigated the hydroxyl group’s role on clamp binding affinity using a fluorescence intensity-based assay71. To monitor the binding reaction, Glu299 of the β-clamp was mutated to Cys and covalently labeled with pyrene (PY) maleimide. Because Cys299 is located on the face of the clamp where the γ complex binds, the bound and unbound states of the clamp can be distinguished by the change in the fluorescence intensity of the environmentally sensitive probe. The γ-mutations did not affect binding of the clamp loader to the β-clamp as judged by Kd,app values (Fig. 4a). The Kd,app increased by 2 fold for δ’ mutation (Fig. 4b) and 7 fold for the γ/δ’-mutations (Fig. 4c), indicating that mutating both γ and δ’ has a synergistic effect. Moreover, the absolute PY intensity for the γ/δ’-mutant was about 75% that for wt, suggesting that mutation of the trans-acting hydroxyl group might be affecting clamp opening: Due to different environmental effects on PY, a clamp loader bound to a closed clamp fluoresces differently than one bound to an open clamp72.
We investigated the hydroxyl group’s role on clamp opening using an assay based on self-quenching by neighboring fluorophores. The β clamp was covalently labeled with AF488 on two cysteine residues, one on each side of the dimer interface, but both on the same side of the clamp. Upon clamp opening, there is an increase in AF488 fluorescence due to relief of self-quenching73,74 (Fig. 4d–f). The clamp opening reaction is at least a two-step reaction that consists of an initial binding step to form a closed clamp loader-clamp complex followed by a clamp opening reaction to form an open clamp loader-clamp complex (Equation 4 in Supplementary Methods). The relative fluorescence intensity at saturating clamp loader concentrations provides information about the relative population of clamp loader-clamp complexes in an open conformation, and the dependence of fluorescence increase on the γ-clamp loader concentration provides information about the binding affinity. The γ-mutant exhibits both a smaller increase in fluorescence intensity, 80% of that of the wt clamp loader, and a 3-fold increase in Kop,app, the apparent clamp binding/opening constant (Fig. 4d). The population of open clamps appears unaffected for the δ’-mutant, as the increase in fluorescence intensity was comparable with the wild type clamp loader, though Kop,app increased 3 fold (Fig. 4e). The combined γ/δ’-mutant has the largest defect in clamp binding/opening: the fluorescence intensity at saturating concentrations was 60% that of the wt clamp loader and the Kop,app increased 15 fold (Fig. 4f). Together, these results show that the γ/δ’-mutation negatively affects β-clamp binding and opening. The smaller proportion of open clamps is consistent with the decreased fluorescence intensity for the γ/δ’-mutant in the β-PY binding assay.
The trans-acting hydroxyl facilitates DNA binding
We investigated the hydroxyl group’s role in loading of the clamp onto single-strand/double-strand DNA junctions, where polymerization begins, using a fluorescence anisotropy-based assay. A primed DNA template labeled with X-rhodamine (RhX) at the 5′ template end exhibits faster rotational dynamics and consequently a small anisotropy value when free in solution than when bound to clamp loader75. In anisotropy assays, polarized emission of RhX was measured with increasing concentrations of clamp loader (Fig. 5a). To measure equilibrium DNA binding, non-hydrolyzable ATPγS was used in place of ATP to block DNA-dependent ATP hydrolysis. The wild-type clamp loader exhibited a robust increase in anisotropy, but the γ/δ’-mutant gave only a small increase in anisotropy at high DNA concentrations. The apparent dissociation constant, Kd,app, was 80 ± 18 nM for the wt complex, whereas the Kd,app for the γ/δ’-mutant was too high to determine experimentally. Thus, elimination of the trans-acting hydroxyl group severely affects ATP-dependent DNA binding activity in assays with ATPγS.
The trans-acting hydroxyl contributes to ATP hydrolysis
We investigated the hydroxyl group’s role in ATP hydrolysis using a coupled enzyme assay, in which each mole of ADP produced is coupled to the oxidation of one mole of NADH to NAD+ 76 (Fig. S3). The value of kcat for the γ/δ’-mutant was reduced the most, by a factor of about 18 (Fig. 5b). Given that ATP hydrolysis is stimulated by DNA, the DNA-dependent ATPase activity was also measured (Fig. 5c). With increasing DNA concentration, the rate of ATP hydrolysis increased for both wt and γ/δ’-mutant complexes reaching 126 ± 7 and 85 ± 8 nM/s, respectively, in the presence of 1 μM DNA. Hence, DNA rescues the ATP hydrolysis activity of the γ/δ’-mutant leading to less than a 2-fold difference in rate from the wt at DNA concentrations ≥125 nM. The γ/δ’ mutant’s lower apparent DNA binding activity as measured in anisotropy assays (Fig. 5a), which substituted the non-hydrolyzable ATP analog ATPγS for ATP, versus these ATPase assays may be due to its inability, when bound to ATPγS, to induce ATP-dependent conformational changes that increase binding affinity for the β clamp and DNA71,77,78.
Eliminating the trans-acting hydroxyl does not affect ATP binding
Mutant defects in ATP hydrolysis and ATP-dependent ligand binding could be due to defects in ATP binding. To test this, ATP binding affinities were assayed (Fig. 5d) using differential scanning fluorimetry (DSF) for the wt and the γ/δ’-mutant clamp loaders, which showed the largest differences in each of the previous assays. Because ligand binding generally increases protein thermal stability, DSF is often used as a measure of ligand binding79. When solutions contained clamp loader only, Tm values were 54.7 ± 0.6 °C for the wt and 52.7 ± 0.3 °C for the γ/δ’-mutant, indicating that the mutant was inherently less stable than wt. Divalent magnesium is required for coordination of the triphosphate in the ATP binding site, and addition of 8 mM magnesium chloride (MgCl2) decreased the Tm values for both clamp loaders: to 52.3 ± 0.6 °C and 52.0 ± 0.5 °C for the wt and mutant, respectively. Addition of MgCl2 to the clamp loaders also resulted in the appearance of a second peak in the denaturation curve at 60 °C for both clamp loaders suggesting that two different conformational states may be present. Addition of ATP only to clamp loaders increased Tm values to 56.3 ± 0.6 °C for the wt and 56.3 ± 0.3 for the mutant indicating ATP stabilizes the complex. Finally, addition of both ATP and MgCl2 gave the largest increase in thermal stability with Tm values of 57.0 ± 1.0 °C for wt and 57.7 ± 0.6 °C for the mutant. In the presence of both ATP and Mg2+, the wild-type and mutant clamp loader show the same thermal stability indicating that the γ/δ’-mutant is binding ATP, and that deficiencies in ATP-dependent ligand interactions and ATP hydrolysis are due to intrinsic defects in those activities.
Overall effect of the hydroxyl group on clamp loading activity
The experiments above measured each of the individual clamp loader-ligand interactions needed for DNA clamp loading, including ATP binding and hydrolysis, clamp binding and opening, and DNA binding. Mutation of the trans-acting hydroxyl to a methyl group affected each of these clamp loader-ligand interactions except for ATP binding. To determine how these deficits would affect the overall clamp loading activity, ATP hydrolysis activities of the wt and mutant clamp loaders were measured in steady-state clamp loading assays. When a clamp is loaded onto DNA, an ATP molecule at each of the binding sites in the clamp loader is hydrolyzed80,81, and thus ATP hydrolysis will report on clamp loading. As expected for the wt clamp loader, ATP hydrolysis activity is the greatest when coupled to clamp loading and is 35-fold faster than in assays with no DNA or β-clamp. ATPase activity of the mutants is also increased relative to activity in assays with no DNA or clamp, but all the mutants are still less active than the wt clamp loader (Fig. 5e). Compared to wt, rates of ATP hydrolysis coupled to clamp loading are 10-, 2-, and 4-fold slower for the γ-, δ’-, and γ/δ’-mutants, respectively. Interestingly, the ATPase activity of the γ/δ’-mutant was rescued to some degree by addition of both the clamp and DNA and was 40 times faster than in assays without DNA and the clamp.
To provide mechanistic clues to protein functional specificity, DARC characterizes six types of sequence/structural constraints (CP, CDC, 3DCDC, 3DCP, DCCP, and CLCP) and provides corresponding statistical significance estimates. When applied to bacterial DNA clamp loaders, it identifies distinguishing features of γ and δ’ AAA+ subunits that are congruent with our current understanding of these proteins. Certain γ and γ/δ’ residues interact with the active site and cluster around the catalytic base with high significance. Other γ/δ’ residues and six high DC-scoring pairs are associated with the α2 and α3 helices’ N-termini, which interact with DNA, whereas other constraints are associated with predicted clamp binding loops.
Whether or not certain mechanistic interpretations are correct, conservation of these features across evolutionary time argues for their functional relevance. Therefore, presumably these residues allosterically channel the energy of ATP hydrolysis into coordinated conformational changes required to load the β clamp onto DNA. Note, however, that eukaryotic and archaeal DNA clamp loaders lack the features distinctive of bacterial clamp loaders and thus presumably utilize a different mechanism.
The most distinguishing γ/δ’ feature, which is therefore likely to play a key role in bacterial clamp loader functional specificity, is a threonine hydroxyl group that interacts in trans with an adjacent active site. Mutation of the hydroxyl to a methyl group in both the γ and δ’ subunits leads to a decrease in clamp binding and opening, ATP hydrolysis and DNA binding, indicating that the hydroxyl group contributes to both ATP-dependent ligand binding and ATP hydrolysis. Although DNA stimulates ATP hydrolysis82,83, clamp loaders slowly hydrolyze ATP in the absence of DNA, and the γ/δ’ mutant’s DNA-independent ATPase activity is 18-fold lower than for the wild-type. This is consistent with a role for the hydroxyl group in ATP hydrolysis, as was previously hypothesized84. Notably, DNA and the clamp rescue γ/δ’-mutant ATPase activity perhaps due to ligand-dependent conformational changes that bring catalytic residues into the optimal geometry for ATP hydrolysis.
Because essentially all other AAA+ ATPases hydrolyze ATP without this threonine and because γ retains the common AAA+ catalytic residues, the hydroxyl group may participate in hydrolysis in a specific manner conducive to bacterial clamp loading. It may assist key steps in the clamp loader reaction by forming a hydrogen bond (perhaps via a water molecule) with ATP or with one or two residues surrounding the active site. Upon interaction of DNA with the N-termini of the (γ and δ′) α2 and α3 helices, the network of γ- and γ/δ’-residues and of high-scoring DC-pairs may allosterically mediate conformational changes involved in the clamp loading reaction that, among other effects, may cause the taThrγδ’ residue to form or disrupt such a hydrogen bond. In any case, the reduction of all ATP-dependent activities in the γ/δ’-mutant shows that the trans-acting hydroxyl group plays a key role in clamp loader functional specificity.
By combining BPPS with DCA, DARC identifies key features that either approach alone would have overlooked. For example, DCA of all AAA+ proteins or of all non-clamp loader AAA+ proteins fails to identify the bacterial-clamp-loader-specific residue pairs that were revealed though DCA of the γ/δ’ subgroup defined by BPPS (Table 1 and Fig. S2). The DCA and BPPS analyses likewise synergize with DARC estimation of 3DCDC, 3DCP, DCCP, and CLCP constraints, which revealed, for example, that both the γ- and the γ/δ’-residues cluster around the catalytic base with high significance. Hence, this study illustrates DARC’s general utility for investigating multifaceted aspects of protein functional specificity.
Data generated or analyzed during this study, if not included in this article or as Supplementary Information, are available at www.igs.umaryland.edu/labs/neuwald/software/darc/.
Neuwald, A. F., Aravind, L., Spouge, J. L. & Koonin, E. V. AAA+: A class of chaperone-like ATPases associated with the assembly, operation, and disassembly of protein complexes. Genome Res 9, 27–43 (1999).
Tucker, P. A. & Sallai, L. The AAA+ superfamily–a myriad of motions. Curr Opin Struct Biol 17, 641–652, https://doi.org/10.1016/j.sbi.2007.09.012 (2007).
Capra, J. A. & Singh, M. Characterization and prediction of residues determining protein functional specificity. Bioinformatics 24, 1473–1480, https://doi.org/10.1093/bioinformatics/btn214 (2008).
Fischer, J. D., Mayer, C. E. & J. Söding, J. Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics 24, 613–620, https://doi.org/10.1093/bioinformatics/btm626 (2008).
Kalinina, O. V., Gelfand, M. S. & Russell, R. B. Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 10, 174, https://doi.org/10.1186/1471-2105-10-174 (2009).
Casari, G., Sander, C. & Valencia, A. A method to predict functional residues in proteins. Nat Struct Biol 2, 171–178 (1995).
Chakraborty, A. & Chakrabarti, S. A survey on prediction of specificity-determining sites in proteins. Brief Bioinform 16, 71–88, https://doi.org/10.1093/bib/bbt092 (2015).
Gaucher, E. A., Gu, X., Miyamoto, M. M. & Benner, S. A. Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem Sci 27, 315–321 (2002).
Hannenhalli, S. S. & Russell, R. B. Analysis and prediction of functional sub-types from protein sequence alignments. J Mol Biol 303, 61–76 (2000).
Janda, J. O., Busch, M., Kuck, F., Porfenenko, M. & Merkl, R. CLIPS-1D: analysis of multiple sequence alignments to deduce for residue-positions a role in catalysis, ligand-binding, or protein structure. BMC Bioinformatics 13, 55, https://doi.org/10.1186/1471-2105-13-55 (2012).
Janda, J. O. et al. H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments. BMC Bioinformatics 15, 118, https://doi.org/10.1186/1471-2105-15-118 (2014).
Kalinina, O. V., Mironov, A. A., Gelfand, M. S. & Rakhmaninova, A. B. Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 13, 443–456, https://doi.org/10.1110/ps.03191704 (2004).
Kolesov, G. & Mirny, L. A. Using evolutionary information to find specificity-determining and co-evolving residues. Methods Mol Biol 541, 421–448, https://doi.org/10.1007/978-1-59745-243-4_18 (2009).
Livingstone, C. D. & Barton, G. J. Identification of functional residues and secondary structure from protein multiple sequence alignment. Methods Enzymol 266, 497–512 (1996).
Marttinen, P., Corander, J., Toronen, P. & Holm, L. Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics 22, 2466–2474, https://doi.org/10.1093/bioinformatics/btl411 (2006).
Mirny, L. A. & Gelfand, M. S. Using orthologous and paralogous proteins to identify specificity determining residues. Genome Biol 3, PREPRINT0002 (2002).
Pirovano, W., Feenstra, K. A. & Heringa, J. Sequence comparison by sequence harmony identifies subtype-specific functional sites. Nucleic Acids Res 34, 6540–6548, https://doi.org/10.1093/nar/gkl901 (2006).
Sankararaman, S. & Sjölander, K. INTREPID–INformation-theoretic TREe traversal for Protein functional site IDentification. Bioinformatics 24, 2445–2452, https://doi.org/10.1093/bioinformatics/btn474 (2008).
Wilkins, A., Erdin, S., Lua, R. & Lichtarge, O. Evolutionary trace for prediction and redesign of protein functional sites. Methods Mol Biol 819, 29–42, https://doi.org/10.1007/978-1-61779-465-0_3 (2012).
Xin, F. & Radivojac, P. Computational methods for identification of functional residues in protein structures. Curr Protein Pept Sci 12, 456–469, CPPS-146 [pii] (2011).
Ye, K., Feenstra, K. A., Heringa, J., Ijzerman, A. P. & Marchiori, E. Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting. Bioinformatics 24, 18–25, https://doi.org/10.1093/bioinformatics/btm537 (2008).
Choudhary, P., Kumar, S., Bachhawat, A. K. & Pandit, S. B. CSmetaPred: a consensus method for prediction of catalytic residues. BMC Bioinformatics 18, 583, https://doi.org/10.1186/s12859-017-1987-z (2017).
Pai, P. P., Dattatreya, R. K. & Mondal, S. Ensemble Architecture for Prediction of Enzyme-ligand Binding Residues Using Evolutionary Information. Mol Inform 36, https://doi.org/10.1002/minf.201700021 (2017).
Pai, P. P., Ranjani, S. S. & Mondal, S. PINGU: PredIction of eNzyme catalytic residues usinG seqUence information. PLoS One 10, e0135122, https://doi.org/10.1371/journal.pone.0135122 (2015).
Chakrabarti, S. & Panchenko, A. R. Ensemble approach to predict specificity determinants: benchmarking and validation. BMC Bioinformatics 10, 207, https://doi.org/10.1186/1471-2105-10-207 (2009).
Dessimoz, C., Skunca, N. & Thomas, P. D. CAFA and the open world of protein function predictions. Trends in genetics: TIG 29, 609–610, https://doi.org/10.1016/j.tig.2013.09.005 (2013).
Jiang, Y., Clark, W. T., Friedberg, I. & Radivojac, P. The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective. Bioinformatics 30, i609–616, https://doi.org/10.1093/bioinformatics/btu472 (2014).
Lichtarge, O., Bourne, H. R. & Cohen, F. E. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257, 342–358 (1996).
Mihalek, I., Res, I. & Lichtarge, O. A family of evolution-entropy hybrid methods for ranking protein residues by importance. J Mol Biol 336, 1265–1282, https://doi.org/10.1016/j.jmb.2003.12.078 (2004).
Lockless, S. W. & Ranganathan, R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999).
Halabi, N., Rivoire, O., Leibler, S. & Ranganathan, R. Protein sectors: evolutionary units of three-dimensional structure. Cell 138, 774–786, https://doi.org/10.1016/j.cell.2009.07.038 (2009).
Wang, S. W., Bitbol, A. F. & Wingreen, N. S. Revealing evolutionary constraints on proteins through sequence analysis. PLoS Comput Biol 15, e1007010, https://doi.org/10.1371/journal.pcbi.1007010 (2019).
Tanwar, A. S., Goyal, V. D., Choudhary, D., Panjikar, S. & Anand, R. Importance of hydrophobic cavities in allosteric regulation of formylglycinamide synthetase: insight from xenon trapping and statistical coupling analysis. PLoS One 8, e77781, https://doi.org/10.1371/journal.pone.0077781 (2013).
Reynolds, K. A., McLaughlin, R. N. & Ranganathan, R. Hot spots for allosteric regulation on protein surfaces. Cell 147, 1564–1575, https://doi.org/10.1016/j.cell.2011.10.049 (2011).
Reynolds, K. A., Russ, W. P., Socolich, M. & Ranganathan, R. Evolution-based design of proteins. Methods Enzymol 523, 213–235, https://doi.org/10.1016/B978-0-12-394292-0.00010-2 (2013).
Tesileanu, T., Colwell, L. J. & Leibler, S. Protein sectors: statistical coupling analysis versus conservation. PLoS Comput Biol 11, e1004091, https://doi.org/10.1371/journal.pcbi.1004091 (2015).
Morcos, F. & Onuchic, J. N. The role of coevolutionary signatures in protein interaction dynamics, complex inference, molecular recognition, and mutational landscapes. Curr Opin Struct Biol 56, 179–186, https://doi.org/10.1016/j.sbi.2019.03.024 (2019).
Cocco, S., Monasson, R. & Weigt, M. From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction. PLoS Comput Biol 9, e1003176, https://doi.org/10.1371/journal.pcbi.1003176 (2013).
Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proceedings of the National Academy of Sciences of the United States of America 108, E1293–1301, https://doi.org/10.1073/pnas.1111471108 (2011).
Hopf, T. A. et al. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149, 1607–1621, https://doi.org/10.1016/j.cell.2012.04.012 (2012).
Jones, D. T., Buchan, D. W., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190, https://doi.org/10.1093/bioinformatics/btr638 (2012).
Lunt, B. et al. Inference of direct residue contacts in two-component signaling. Methods Enzymol 471, 17–41, https://doi.org/10.1016/S0076-6879(10)71002-8 (2010).
Marks, D. S. et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One 6, e28766, https://doi.org/10.1371/journal.pone.0028766 (2011).
Marks, D. S., Hopf, T. A. & Sander, C. Protein structure prediction from sequence variation. Nat Biotechnol 30, 1072–1080, https://doi.org/10.1038/nbt.2419 (2012).
Nugent, T. & Jones, D. T. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proceedings of the National Academy of Sciences of the United States of America 109, E1540–1547, https://doi.org/10.1073/pnas.1120036109 (2012).
Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. & Hwa, T. Identification of direct residue contacts in protein-protein interaction by message passing. Proceedings of the National Academy of Sciences of the United States of America 106, 67–72, https://doi.org/10.1073/pnas.0805923106 (2009).
Baldassi, C. et al. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS One 9, e92721, https://doi.org/10.1371/journal.pone.0092721 (2014).
Seemayer, S., Gruber, M. & Söding, J. CCMpred–fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics 30, 3128–3130, https://doi.org/10.1093/bioinformatics/btu500 (2014).
Neuwald, A. F. & Altschul, S. F. Statistical investigations of protein residue direct couplings. PLoS Comput Biol 14, e1006237, https://doi.org/10.1371/journal.pcbi.1006237 (2018).
Neuwald, A. F. Protein domain hierarchy Gibbs sampling strategies. Statistical Applications in Genetics and Molecular Biology 13, 497–517, https://doi.org/10.1515/sagmb-2014-0008 (2014).
Neuwald, A. F. A Bayesian sampler for optimization of protein domain hierarchies. J Comput Biol 21, 269–286, https://doi.org/10.1089/cmb.2013.0099 (2014).
Neuwald, A. F., Aravind, L. & Altschul, S. F. Inferring joint sequence-structural determinants of protein functional specificity. Elife 7, https://doi.org/10.7554/eLife.29880 (2018).
Neuwald, A. F. & Altschul, S. F. Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations. PLoS Comput Biol 12, e1005294, https://doi.org/10.1371/journal.pcbi.1005294 (2016).
Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M. & Aurell, E. Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models. Physical Review E 87, 012707, https://doi.org/10.1103/PhysRevE.87.012707 (2013).
Neuwald, A. F. Rapid detection, classification and accurate alignment of up to a million or more related protein sequences. Bioinformatics 25, 1869–1875 (2009).
NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44, D7–19, https://doi.org/10.1093/nar/gkv1290 (2016).
Kong, X. P., Onrust, R., O’Donnell, M. & Kuriyan, J. Three-dimensional structure of the beta subunit of E. coli DNA polymerase III holoenzyme: a sliding DNA clamp. Cell 69, 425–437 (1992).
McHenry, C. S. DNA replicases from a bacterial perspective. Annu Rev Biochem 80, 403–436, https://doi.org/10.1146/annurev-biochem-061208-091655 (2011).
Stukenberg, P. T., Studwell-Vaughan, P. S. & O’Donnell, M. Mechanism of the sliding beta-clamp of DNA polymerase III holoenzyme. J Biol Chem 266, 11328–11334 (1991).
Waga, S. & Stillman, B. The DNA replication fork in eukaryotic cells. Annu Rev Biochem 67, 721–751, https://doi.org/10.1146/annurev.biochem.67.1.721 (1998).
Hedglin, M., Kumar, R. & Benkovic, S. J. Replication clamps and clamp loaders. Cold Spring Harb Perspect Biol 5, a010165, https://doi.org/10.1101/cshperspect.a010165 (2013).
Indiani, C. & O’Donnell, M. The replication clamp-loading machine at work in the three domains of life. Nat Rev Mol Cell Biol 7, 751–761, https://doi.org/10.1038/nrm2022 (2006).
Kelch, B. A., Makino, D. L., O’Donnell, M. & Kuriyan, J. Clamp loader ATPases and the evolution of DNA replication machinery. BMC Biol 10, 34, https://doi.org/10.1186/1741-7007-10-34 (2012).
Jeruzalmi, D., O’Donnell, M. & Kuriyan, J. Crystal structure of the processivity clamp loader gamma (γ) complex of E. coli DNA polymerase III. Cell 106, 429–441 (2001).
Simonetta, K. R. et al. The mechanism of ATP-dependent primer-template recognition by a clamp loader complex. Cell 137, 659–671, https://doi.org/10.1016/j.cell.2009.03.044 (2009).
Neuwald, A. F. Bayesian shadows of molecular mechanisms cast in the light of evolution. Trends Biochem Sci 31, 374–382, https://doi.org/10.1016/j.tibs.2006.05.002 (2006).
Bowman, G. D., O’Donnell, M. & Kuriyan, J. Structural analysis of a eukaryotic sliding DNA clamp-clamp loader complex. Nature 429, 724–730, https://doi.org/10.1038/nature02585 (2004).
Kelch, B. A., Makino, D. L., O’Donnell, M. & Kuriyan, J. How a DNA polymerase clamp loader opens a sliding clamp. Science 334, 1675–1680, https://doi.org/10.1126/science.1211884 (2011).
Hattendorf, D. A. & Lindquist, S. L. Cooperative kinetics of both Hsp104 ATPase domains and interdomain communication revealed by AAA sensor-1 mutants. EMBO J 21, 12–21, https://doi.org/10.1093/emboj/21.1.12 (2002).
Schumacher, J. et al. Sensor I threonine of the AAA+ ATPase transcriptional activator PspF is involved in coupling nucleotide triphosphate hydrolysis to the restructuring of sigma 54-RNA polymerase. J Biol Chem 282, 9825–9833, https://doi.org/10.1074/jbc.M611532200 (2007).
Thompson, J. A., Paschall, C. O., O’Donnell, M. & Bloom, L. B. A slow ATP-induced conformational change limits the rate of DNA binding but not the rate of beta clamp binding by the Escherichia coli gamma complex clamp loader. J Biol Chem 284, 32147–32157, https://doi.org/10.1074/jbc.M109.045997 (2009).
Hayner, J. N. & Bloom, L. B. The beta sliding clamp closes around DNA prior to release by the Escherichia coli clamp loader gamma complex. J Biol Chem 288, 1162–1170, https://doi.org/10.1074/jbc.M112.406231 (2013).
Donaphon, B., Bloom, L. B. & Levitus, M. Photophysical characterization of interchromophoric interactions between rhodamine dyes conjugated to proteins. Methods Appl Fluoresc 6, 045004, https://doi.org/10.1088/2050-6120/aad20f (2018).
Paschall, C. O. et al. The Escherichia coli clamp loader can actively pry open the beta-sliding clamp. J Biol Chem 286, 42704–42714, https://doi.org/10.1074/jbc.M111.268169 (2011).
Bloom, L. B. et al. Dynamics of loading the beta sliding clamp of DNA polymerase III onto DNA. J Biol Chem 271, 30699–30708 (1996).
Norby, J. G. Coupled assay of Na+,K+-ATPase activity. Methods Enzymol 156, 116–119 (1988).
Anderson, S. G., Williams, C. R., O’Donnell, M. & Bloom, L. B. A function for the psi subunit in loading the Escherichia coli DNA polymerase sliding clamp. J Biol Chem 282, 7035–7045, https://doi.org/10.1074/jbc.M610136200 (2007).
Naktinis, V., Onrust, R., Fang, L. & O’Donnell, M. Assembly of a chromosomal replication machine: two DNA polymerases, a clamp loader, and sliding clamps in one holoenzyme particle. II. Intermediate complex between the clamp loader and its clamp. J Biol Chem 270, 13358–13365 (1995).
Niesen, F. H., Berglund, H. & Vedadi, M. The use of differential scanning fluorimetry to detect ligand interactions that promote protein stability. Nat Protoc 2, 2212–2221, https://doi.org/10.1038/nprot.2007.321 (2007).
Bertram, J. G. et al. Molecular mechanism and energetics of clamp assembly in Escherichia coli. The role of ATP hydrolysis when gamma complex loads beta on DNA. J Biol Chem 275, 28413–28420, https://doi.org/10.1074/jbc.M910441199 (2000).
Hingorani, M. M., Bloom, L. B., Goodman, M. F. & O’Donnell, M. Division of labor–sequential ATP hydrolysis drives assembly of a DNA polymerase sliding clamp around DNA. EMBO J 18, 5131–5144, https://doi.org/10.1093/emboj/18.18.5131 (1999).
Lee, S. H. & Walker, J. R. Escherichia coli DnaX product, the tau subunit of DNA polymerase III, is a multifunctional protein with single-stranded DNA-dependent ATPase activity. Proceedings of the National Academy of Sciences of the United States of America 84, 2713–2717 (1987).
Onrust, R., Stukenberg, P. T. & O’Donnell, M. Analysis of the ATPase subassembly which initiates processive DNA synthesis by DNA polymerase III holoenzyme. J Biol Chem 266, 21681–21686 (1991).
Neuwald, A. F. Hypothesis: bacterial clamp loader ATPase activation through DNA-dependent repositioning of the catalytic base and of a trans-acting catalytic threonine. Nucleic Acids Res 34, 5280–5290, https://doi.org/10.1093/nar/gkl519 (2006).
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637, https://doi.org/10.1002/bip.360221211 (1983).
F.T. was supported by NIH-NIAID T32 training grant 2T32AI007110. The L.B.B. laboratory is supported by NIH grant R03AI1135579 and NSF grant MCB-1817869. S.F.A. is supported by the Intramural Research Program of the National Institutes of Health, National Library of Medicine. A.F.N. is supported by NIH-NIGMS grant R01GM125878. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Tondnevis, F., Dudenhausen, E.E., Miller, A.M. et al. Deep Analysis of Residue Constraints (DARC): identifying determinants of protein functional specificity. Sci Rep 10, 1691 (2020). https://doi.org/10.1038/s41598-019-55118-6
Fine-Tuning of Alanyl-tRNA Synthetase Quality Control Alleviates Global Dysregulation of the Proteome
Nature Structural & Molecular Biology (2020)