A litmus test for classifying recognition mechanisms of transiently binding proteins

Partner recognition in protein binding is critical for all biological functions, and yet, delineating its mechanism is challenging, especially when recognition happens within microseconds. We present a theoretical and experimental framework based on straight-forward nuclear magnetic resonance relaxation dispersion measurements to investigate protein binding mechanisms on sub-millisecond timescales, which are beyond the reach of standard rapid-mixing experiments. This framework predicts that conformational selection prevails on ubiquitin’s paradigmatic interaction with an SH3 (Src-homology 3) domain. By contrast, the SH3 domain recognizes ubiquitin in a two-state binding process. Subsequent molecular dynamics simulations and Markov state modeling reveal that the ubiquitin conformation selected for binding exhibits a characteristically extended C-terminus. Our framework is robust and expandable for implementation in other binding scenarios with the potential to show that conformational selection might be the design principle of the hubs in protein interaction networks.

P rotein-ligand or protein-protein interactions underpin biological control mechanisms, and a detailed kinetic understanding of the interactions with atomic resolution is necessary to develop drug molecules. The role of molecular motion in these interactions is a very long-standing question, especially in the regime of fast kinetics 1 . Therefore, characterization of protein-ligand or protein-protein (protein-partner) interactions has been of interest for a long time, and the development of new experimental methods is crucial [2][3][4][5][6][7][8][9][10][11][12][13][14] . This line of research is critical to disentangle almost all molecular recognition in a cell 15 , including understanding the binding mechanism in terms of two-state binding vs. three-state binding via conformational selection or induced fit [16][17][18] .
In conformational selection 19,20 and induced fit 21 , a conformational change occurs either prior to or after binding (Fig. 1). Prominent examples for induced fit include conformational changes from an open to a closed protein conformation after ligand binding 22 . Here, induced fit as binding mechanism can be directly deduced from protein structures if the entrance to the ligand-binding site is sterically blocked in the closed conformation of the bound form 22 . Other prominent examples for induced fit are protein systems with two bound forms of disordered fragments observed in nuclear magnetic resonance (NMR) experiments 23,24 . In the pKID/KIX system, the exchange between the free form and the bound forms is slow on the chemical shift timescale, which results in distinct peaks of these forms in NMR spectra 23 . Threestate fitting of NMR relaxation dispersion data and characteristic chemical shift changes during titration then directly evidence the existence of a binding mechanism with three states, whose structural identity can also be derived from chemical shift changes. Conformational selection in protein binding has been pioneered in NMR experiments that demonstrated conformational exchanges in the free protein form that are comparable to structural changes between the free and bound forms 7,25 . Relaxation-dispersion NMR methods to characterize low-populated conformations in free protein forms have been recently extended 26 using paramagnetically induced pseudocontact shifts to increase the chemical shift range between different conformations 27 . But as a binding mechanism, conformational selection requires the additional kinetic proof that excited states observed e.g. in NMR experiments of the free form are on-pathway in the binding reaction. For protein binding reactions with relaxation times of milliseconds to seconds, such a kinetic proof can be provided by stopped-flow mixing experiments 18,28,29 . However, a general approach to investigate protein binding mechanisms is missing on submillisecond time scales where stopped flow is too slow or where the exchange between the free and bound protein forms is fast on the NMR chemical shift timescale under all stoichiometric conditions.
Here, we report the development of a theoretical and experimental framework for investigating protein-partner interaction with recognition kinetics down to tens of microseconds with atomistic detail. The framework can be applied to any binding regime but is particularly insightful for weak, transient binding with off-rates k off larger than 1000 s −1 . The kinetics of ligand binding are measured using high-power relaxation dispersion experiments, which have been shown to reveal kinetics down to single digit microseconds in individual proteins 30 . High-power relaxation dispersion is uniquely advantageous for measurement of both slow (<1000 s −1 ) and fast (up to~37,000 s −1 ) kinetics in comparison to R 1ρ or off-resonance R 1ρ experiments regarding both experimental setup and data analysis 30 . We apply our framework to analyze the binding of the paradigmatic protein ubiquitin 7 to its partner protein, the SH3c domain of CIN85. The interaction of ubiquitin and the SH3c domain is weak and transient (with dissociation constant K d = 370 ± 15 μM from NMR titrations), akin to many other biologically important interactions. We show with the measurement of concentrationdependent kinetics using relaxation dispersion in the fastexchange regime ( Supplementary Fig. 1) that three-state binding via conformational selection dominates the kinetics of the binding on the side of ubiquitin. For the partner protein SH3c, we find consistence with two-state binding, in agreement with threestate conformational selection on the ubiquitin side. This concentration-dependent relaxation dispersion measurement and fitting procedure constitutes a litmus test for the recognition mechanism. In a subsequent step, we use molecular dynamics simulations and Markov state modeling [31][32][33][34][35][36][37][38][39][40][41] to identify the ubiquitin conformation selected for binding. This bindingcompetent ubiquitin conformation exhibits a characteristically extended C-terminus.
Ubiquitin is a hub of the cellular interaction network. At the same time, CIN85 is an adapter molecule that controls the spatial and temporal assembly of multi-protein complexes by its three  SH3 domains that bind other partners 42 . Thus, the interaction of ubiquitin with the third SH3 domain of CIN85 (SH3c) is the natural choice for testing our framework. Besides, given that the precise interaction of ubiquitin with many partners is a hallmark of cell function, it makes one wonder if there is a conformational selection mechanism in ubiquitin, where a minor "bindingcompatible" conformation binds the partner specifically. Previous work has shown that free ubiquitin consists of an ensemble of conformations, including the "bound-like" conformations seen in ubiquitin complexes, thus supporting conformational selection as the binding mechanism 7 . Subsequent experimental and computational work has delineated the dynamic modes of ubiquitin in granular details 43 . Here we show how our expandable theoretical and experimental framework brings together the different internal dynamics and explains this paradigmatic protein-partner interaction.

Results and discussion
Measuring ligand-concentration dependent high-power relaxation dispersion enables distinguishing between binding mechanisms. We characterized both proteins starting with ubiquitin since we had previously demonstrated that bindingcompetent conformations exist in the ubiquitin ensemble in the absence of binding partners 7 . The presence of binding-competent conformations is a necessary but not sufficient condition for conformational selection as the question is essentially about binding kinetics 16,44 . We previously determined fast conformational transitions with exchange rates k ex larger than about 20,000 s −1 in free ubiquitin using relaxation dispersion 43 . NMR titration indicates also fast exchange between the free and bound forms of ubiquitin and SH3c, because we observed only one cross peak for all ratios of ubiquitin and SH3c ( Supplementary Fig. 1), and because the intensity of this peak decreased monotonously with increasing formation of the complex ( Supplementary Fig. 1e, i). Thus, we determined the concentration-dependent exchange rate k ex of the complex formation at different partner concentrations as in our previous experiments on free ubiquitin by fitting the relaxation rates R 2,eff with the fast-exchange Luz-Meiboom equation 45 (Fig. 2a, b and "Methods"). To gain insight on the binding mechanism from these experimentally determined k ex values, we have developed analytical equations for the variation of k ex with varying partner concentration, for two-state binding without (kinetically) relevant conformational change during binding, and for three-state binding with a conformational change prior to the binding step (conformational selection), or after binding (induced fit) (Fig. 1). These equations for k ex hold at all concentrations of the proteins, in contrast to related equations for the dominant relaxation rate k obs of stopped-flow mixing experiments derived under the 'pseudo-first assumption' of an excess concentration of one of the binding partners (Supplementary Methods). The analytical equations guided us in selecting conditions for the measurement of kinetic parameters for the ubiquitin-SH3c system, starting with a ratio of 1 (SH3c) to 50 (ubiquitin) and increasing the concentration of SH3c to 1:1. This wide sub-stoichiometric range of ubiquitin-SH3c ratios allows to identify the slope and curvature of k ex as a function of the SH3c concentration [L] 0 and to determine the unbinding rate k off in the limit [L] 0 to 0 (Fig. 1 Supplementary Fig. 2). The exchange rate k ex decreases with increasing total concentration of the binding partner SH3c at the large majority of the 22 residue positions ( Fig. 2c and Supplementary Fig. 3), which signifies conformational selection and excludes two-state binding and induced fit (Fig. 1). The rate parameters for the conformationalselection model obtained from fitting of the concentrationdependent k ex data at the 22 residues positions are overall consistent (Fig. 2e, f) and support conformational selection of a lowpopulated, excited ubiquitin conformation prior to binding to SH3c. Weighted averaging of the fitted rate parameters leads to the conformational excitation rate k 12 = 1280 ± 170 s −1 and to the unbinding rate k off = k − = 2420 ± 140 s −1 (dashed blue lines in Fig. 2e, f). The population k 12 /(k 12 + k 21 ) of the excited unbound ubiquitin conformation is not larger than about 6.5% because the rate k 12 + k 21 for the conformational exchange in free ubiquitin is not smaller than about 20,000 s −1 according to previous measurements 43 .
Measurements on the side of SH3c reveal 12 residue positions with k ex values affected by ubiquitin as binding partner (Supplementary Methods, Supplementary Table 3, and Supplementary Fig. 2). The k ex curves at these 12 residue positions are consistent with two-state binding, in agreement with a conformational-selection three-state binding mechanism for ubiquitin, in which SH3c has two states ( Fig. 2d and Supplementary Fig. 4). Weighted averaging of the single fit parameter k off leads to k off = 1.43 ± 0.04 ms −1 (or 1430 ± 40 s −1 ), which is close to the unbinding rate k off obtained from the fits of the conformational-selection model on the side of ubiquitin.
Markov modeling identifies a ubiquitin C-terminal mode as conformational-selection mode. To identify the ubiquitin conformation selected for binding, we carried out approximately 1.68 ms of molecular dynamics simulations and used these to build a Markov state model (MSM) 31,32 that describes the conformational dynamics during binding as a kinetic network of metastable states (see Methods). The most stable state of the MSM is a structurally diverse, bound state that encompasses two published ubiquitin:SH3c models (PDB 2K6D and 2JT4) 46,47 . A comparison to previously reported distances derived from paramagnetic relaxation enhancement (PRE) measurements 46 indicates that this bound state of our MSM recapitulates the experimental bound state well ( Supplementary Fig. 5). Based on this bound state and an unbound state in which the distance of ubiquitin and SH3c is larger than 1 nm, we employ transition path theory [48][49][50] to compute a committor probability, or binding probability, p bind that quantifies the progress along the binding transition paths of the MSM (see "Methods"). We use adaptive sampling to access intermediate and unbound states with p bind < 1 41 . Overall, the MSM resolves the reversible binding process of ubiquitin and SH3c in atomic detail and predicts a dissociation constant of binding that agrees with the experimental value within the statistical uncertainty ("Methods"). Markov state modeling and molecular dynamics simulations have been previously used to investigate the conformational changes of proteins during binding to small ligands 38,39,[51][52][53] and the binding-induced folding of disordered peptides 40,[54][55][56][57] .
A peptide-flip motion between "in" and "out" conformations of ubiquitin emerged as a slow motion in earlier work 43 and, thus, as possible candidate of a conformational-selection mode. However, the previously described mutant G53A that locks ubiquitin almost fully into the "out" conformation along the peptide-flip motion and the novel G53(D)T mutant (chemically synthesized with (D)-Threonine at position 53 and E24 15 N labeled as a reporter) that locks ubiquitin almost fully into the "in" conformation (see Methods) do not have a large effect on the dissociation constant K d of ubiquitin and SH3c, with K G53ðDÞT conformation, and that the population-shift of the peptide-flip motion during binding to SH3c is rather small. The population shift can be calculated from the ratio of the dissociation constants for the "in" and "out" conformations and, thus, from the ratio of K G53ðDÞT d and K G53A d (Supplementary Methods). The binding-induced population shift of the peptide-flip motion in our Markov state model is also small, in agreement with the mutational data. As in previous molecular dynamics simulations 43 , the peptide-flip motion in our simulations is accelerated compared to the experiments. Similar to the peptide flip, the population shift of the pincer mode of ubiquitin 58 during binding to SH3c is rather small in the MSM (Supplementary Methods).
Besides the peptide-flip mode, an independent and similarly slow motion in our simulations and Markov modeling involves the flexible C-terminal tail of ubiquitin. In free ubiquitin, we observe two distinct compact and extended conformations of the C-terminal tail, which we define via time-lagged independent component analysis 59,60 of the C-terminal backbone torsion angles of ubiquitin, considering only the unbound states with p bind = 0 ("Methods"). Our Markov model constructed from 1.68 ms of binding simulations indicates that the population of the compact C-terminal conformation is strongly reduced during binding, and that this population reduction occurs prior to the transition state of binding, which is a clear signature of conformational selection. Fig. 3a illustrates the reactive flux between the dominant coarse-grained states of our MSM in binding direction. Along the binding pathways, the population of the compact C-terminal conformation diminishes from 21% (confidence interval (CI): 17-25%) for the compact, unbound state P 1 to 2.4% (CI: 1.7-3.4%) in the transition-state ensemble, which is composed of the states A, B, and C with intermediate binding probability 0.45 < p bind < 0.75, and remains low in the bound state F with a population value of 5.8% (CI: 4.2-7.8%). The vanishing population of the compact conformation in the transition state implies that productive binding events, across the transition state, are not possible in this conformation. Unlike the extended C-terminal conformation, the compact conformation sterically obstructs binding of SH3c to ubiquitin (Fig. 4). Consequently, the extended conformation of the C-terminus likely is the sought-after ubiquitin conformation selected for binding. Based on our fits of the k ex data, we expect a more drastic shift in populations for the conformational-selection mode, i.e., a larger population of the compact C-terminal conformation in the unbound state. However, the discrepancy we observe between experiment and modeling is within systematic errors in state-ofthe-art molecular dynamics force-fields 61,62 that were the basis of the MSM. Relative populations of alternative conformations are notoriously difficult to estimate from molecular dynamics simulations, because systematic errors of few kJ mol −1 can lead to large deviations in populations.
In summary, we introduce a litmus-test-like theoretical and experimental framework to identify conformational selection of transiently binding proteins on sub-millisecond timescales that are beyond the reach of standard stopped-flow mixing experiments or NMR methods relying on intermediate or slow exchange between bound and unbound protein forms. Our framework extends the time resolution in protein binding experiments in a way that is comparable to the timescale extension provided by temperature-jump experiments of protein folding relative to stopped-flow mixing experiments 63,64 . We e Unbinding rates k off obtained from fits with the conformational-selection model for ubiquitin residues (blue data points) and from fits with the two-state binding model for SH3c residues (red data points, Supplementary Figs. 3 and 4). f Conformational excitation rate k 12 from conformational-selection fits of ubiquitin residues (Supplementary Fig. 3). Global and residue-specific uncertainties in the R 2,eff values were estimated as described in Methods. The larger one of these two uncertainty estimates for each data is reported (smaller than the plot markers). The error bars in (c-f) represent standard errors of data fits ("Methods" and Supplementary Methods). Source data are provided as a Source Data file.

ARTICLE
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-31374-5 expect that this framework will be applicable for many transient complexes. For the paradigmatic ubiquitin-SH3c complex, we identify conformational selection of ubiquitin, which agrees with the two-state recognition mechanism observed for the binding partner SH3c. In a complementary computational approach that involves molecular dynamics simulations and Markov modeling, we find that the ubiquitin conformation selected for binding exhibits a characteristically extended C-terminus. This framework makes future explorations possible to test the hypothesis that hub proteins such as ubiquitin utilize conformational selection as an evolutionary mechanism to be more adaptable. In the unbound state with compact C-terminal conformation P 1 , ubiquitin is shown in blue. SH3c as ligand L is shown in cyan. The relative probabilities of the compact and extended C-terminal conformation in the different states are indicated in blue and red. The probabilities p bind of the Markov states for reaching the native bound state prior to the fully unbound state are given at the bottom. b Coarse view of the binding mechanism with the unbound ubiquitin states P 1 and P 2 and the binding transition state P 2 L † and bound state P 2 L in which ubiquitin predominantly adopts the conformation P 2 with extended C-terminus. Da) and the purest fractions were pooled. For preparing the NMR sample the protein was dialyzed overnight against 20 mM sodium phosphate, pH 6.5, 100 m NaCl. After addition of 10 mM TCEP, 10% D2O (v/v) and 0.05% NaN 3 (w/v), the final protein concentration was adjusted to 2 mM. The volume of the NMR sample following the dialysis and concentration steps was 350 μL (5.95 mg in total). The spectra showed that the protein is folded (Fig. 5). All other chemicals were purchased from Merck, Sigma, Alfa Aesar and Multisyntech.

Methods
Crystallization of D-Thr 53 -ubiquitin, data collection and structure determination. Protein from the NMR sample was also used for crystallization.  74 . The crystal structure has the same overall fold as the wild-type ubiquitin except the region near residue position 53 (Fig. 5). 75,76 . All mutants and wild-type ubiquitin were isotopically labeled, expressed and purified as described 77 .
Kinetics of interconversion of free wt, G53A, and G53(D)T mutant. HSQC spectra were recorded for wt ubiquitin and its G53A and G53(D)T mutants. The chemical shifts of E24 amide are vastly different, by 1.981 ppm in the proton dimension due to the different chemical environment of the two mutants (Fig. 5).
The wild-type ubiquitin is a mixture of the "in" and "out" conformations of the peptide-flip motion, the chemical shift of the proton and nitrogen being more or less in the middle. The wild-type resonance measured at 400 MHz is visible only at 308 K. At 277, it is exchange broadened beyond detection. The HSQC of G53(D)T was measured at 308K in a 900 MHz spectrometer. The HSQC of the G53A mutant at 308 K was measured at 700 MHz spectrometer. To calculate the populations of the "in" and "out" conformation in the wt, we used the weighted average of the proton and nitrogen chemical shifts of E24 of the two mutants. The error in the position of the weakest peak E24 in wt was calculated as 1/2⋅linewidth/signal-tonoise. The error in peak position was propagated to obtain the population error of ±6%. The E24 in the "in" and "out" peptide-flip G53A and G53(D)T mutants are visible at 277 K, indicating that there is no exchange for the mutants indicating that they are locked in the "in" or "out" conformations, respectively. Both the G53A and G53(D)T mutants, corresponding to the "out" and "in" conformations are measured in 800 MHz spectrometer.
High-power relaxation dispersion of ubiquitin (with SH3c titrated in). The high-power relaxation experiments were measured using the 15 Figs. 7-12).
The constant volume of 200 μL of NMR samples was put inside 3 mm tubes (Hilgenberg GmbH) in 20 mM sodium phosphate buffer, pH 6.5, containing 100 mM NaCl, 10mM TCEP, 0.05% (w/v) sodium azide, and 10% D 2 O. In all experiments the ubiquitin ( 15 N labeled) concentration was 1 mM. The SH3c (unlabeled) concentration was varied from 0, 0.02, 0.05, 0.1, 0.25, 0.5 mM up to 1 mM. The probe temperature was calibrated using a digital thermometer and standard methanol sample.
The reference spectra were collected without the CPMG delay period (τ). The R 2,eff was calculated as where ν CPMG is the effective frequency of the CPMG field, (ν CPMG = 1/(4τ), where the time between the centers of consecutive 180 ∘ pulses is 2τ), T is the constant delay during which CPMG pulses were applied (60 ms), I 0 is the intensity of the peak in reference experiment and I(ν) is the intensity of the peak at that particular CPMG frequency. The CPMG delay (60 ms) was chosen such that the residual intensity was approximately 50% of maximum intensity. The experiment was performed with 3 s recycle delay between increments using 12 different refocusing field strengths between 0 and 6000 Hz collected in scrambled and interleaved manner with 1024 ( 1 H) and 130 ( 15 N) complex points, respectively. For each increment, 16 transients were measured following the Echo-AntiEcho scheme for signal averaging. There is a heat compensation block in the middle of the recycle delay to dump the extra CPMG cycles so that the total number of CPMG 180 ∘ refocusing pulses at fixed B 1 field strength is identical during the individual scans. The E-CPMG experiments took 3 days to complete, and standard 1 H, 15 N TROSY-HSQC spectra were collected before and after each experiment to monitor sample stability. A set of 5 non-exchanging residues were identified based on the criteria of lowest standard deviation between the R 2,eff values. The global uncertainty for the experimental data was calculated as the average of the standard deviations of the set of 5 residues 29 . The residue-specific uncertainties were calculated from measuring the deviation between R 2,eff values in repeat measurements at a suitable frequency (667 Hz). The largest of the global or residue-specific uncertainties is reported.
High-power relaxation dispersion of SH3c (with ubiquitin titrated in). The high-power relaxation dispersion on the 15 N labeled SH3c were measured at 277 K in Bruker Avance-III 800 MHz spectrometer equipped with cryoprobe-TCI. The refocusing pulses were applied with γB 1 /2π~5 kHz for 15 N in an interleaved manner with 3 s recovery delay. The spectra were recorded with 1024 and 156 complex points in the direct and indirect dimensions, respectively. The NMR experiments were performed with the 15 N-labeled CIN85-SH3 and unlabeled ubiquitin complex in 20 mM sodium phosphate buffer, pH 6.5, containing 100 mM NaCl, 10mM TCEP, 0.05% (w/v) sodium azide, and 10% D 2 O. In this experiment the SH3c ( 15 N labeled) concentration was kept fixed at 1 mM, and the ubiquitin (unlabeled) concentration was varied from 0, 0.02, 0.05, 0.075, 0.1, 0.15, 0.25, 0.5 mM, up to 1 mM. The R 2,eff values were measured at the same frequencies as the previous experiment.
Fast exchange of free and bound forms. Linear shifts of peaks in HSQC spectra upon titration indicate fast exchange of free and bound forms ( Supplementary  Fig. 1). Supplementary Fig. 1a shows a series of HSQC spectra upon titration of ubiquitin with SH3c, and Supplementary Fig. 1b shows the same for SH3c titrated with ubiquitin. The ratios between the two proteins ranged from 0 to 80% in the first case and from 0 to 78% in the second case. The amount of the bound complex was limited by the solubility of the proteins. The peaks shift linearly without indication of a third state. The linewidths both in the proton and nitrogen dimension increase with increasing concentration of the other component Fitting of relaxation dispersion data with a two-state exchange model. We fitted the relaxation rates R 2,eff with the fast-exchange formula 2,45 On the ubiquitin side, the two NMR data sets for R 2,eff as a function of ν = 1/ (4τ) at the two 15 N resonance frequencies 60.795 MHz and 96.313 MHz (blue and yellow data points in Supplementary Figs. 7 to 12, respectively) were jointly fitted using the four fit parameters R 2,0 (60.795 MHz), R 2,0 (96.313 MHz), ψ ex , and k ex . The fit results for the two-state exchange rate k ex at the different SH3c concentrations and ubiquitin residue positions are shown in Supplementary Table 2. We used the function NonlinearModelFit of Mathematica 11.3 78 in these fits. The errors ΔR 2,eff of the data points were included as weights 1=ðΔR 2;eff Þ 2 in the fitting, and the errors of the fit parameters were estimated from the fit residuals with the standard variance estimator function of NonlinearModelFit. Because of the typically smaller errors of the blue data points obtained at the 15 N resonance frequency 60.795 MHz, the joint fits of the data at both resonance frequencies tend to be more faithful to these blue data, compared to the yellow data points obtained at the 15 N resonance frequency 96.313 MHz (Supplementary Figs. 7-12).
On the SH3c side, the NMR data for R 2,eff as a function of ν at the 15 N resonance frequency of 81.1 MHz were fitted with the three fit parameters R 2,0 (81.1 MHz), ψ ex , and k ex . The fit results for k ex at the different ubiquitin concentrations and SH3c residue positions are shown in Supplementary Table 3. The errors were estimated from the fit residuals with the standard variance estimator function of NonlinearModelFit of Mathematica 11.3.
Molecular dynamics simulations of ubiquitin-SH3c binding. We adopted the coordinates from the complex (PDB-file 2K6D) as a starting point to generate the topology for our simulation system. Several N-and C-terminal residues were missing in the SH3c chain when compared to the experimental construct. Consequently, amino acids GHMDSRT and DFEKE were added respectively to the Nand C-termini of the SH3c chain, using PyMOL. We performed all equilibration simulations using GROMACS 5.1.4 79 . We separated the ubiquitin and SH3c chains into independent simulation systems. These chains were independently solvated; we added Na + and Cl − ions to neutralize the simulation box, which was then energy minimized and equilibrated in the NpT ensemble for 100 ps. Finally, we equilibrated for five nanoseconds in the NVT ensemble at 330K with the Amber99SB-ILDN forcefield 80 . We used an integration time-step of 2 fs, kept the simulation box temperature using the Bussi-thermostat 81 , and treated long-range electrostatics using the Particle Mesh Ewald method. In the simulations of ubiquitin, we used a cubic box with side-length 6.55 nn that contained 8863 TIP3P water molecules, and protonated His68 at Nϵ. In the simualtions of SH3c, we used a cubic box with side-length 6.53 nm that contained 9084 TIP3P water molecules, 6 Na + ions, and protonated His2 at Nϵ. Using PyMOL, we extract ten random configurations of the protein chains from the ubiquitin and SH3c equilibration simulations. We paired the ubiquitin and SH3c configurations together randomly, without replacement. Each pair of structures were placed randomly (non-overlapping) in a cubic box of side-length 10.0 nm. Using GROMACS 5.1.4, we solvated each of ten starting orientations of ubiquitin and SH3c in 31,817 TIP3P water molecules, adding 66 Na + and 60 Cl − ions to a final concentration of 100 mM NaCl. The total system size is 98,995 atoms. We use the Amber99SB-ILDN forcefield, to energy minimize the simulation box, followed by equilibration in the NpT ensemble for 100 ps to a final box size of 10.0 nm 3 . We export the final system coordinates for the initialization of production simulations on graphics processing units (GPUs) in OpenMM 7.5 82 .
In our production simulations 82 , we used hydrogen-mass repartitioning with heavy protons (4 amu) and constrained all covalent bonds to enable a 4 fs integration time step. We used the Amber99SB-ILDN forcefield for the protein chains, and TIP3P for the water molecules. The Particle mesh Ewald method was used to treat electrostatic interactions beyond 0.9 nm. We integrated the system using a Langevin integrator with a friction constant of 1 ps −1 and thermostatting to 300 K. We performed 200 ps equilibration simulations of each of the starting configurations in the NVT ensemble, and observed no energy or temperature drift after a few ps.
We ran 1015 simulations in total, across five adaptive rounds, with approximately 200 concurrent simulations per round. The longest simulations were 4 μs, and 50% of all simulations were between 811 ns and 2 μs. We used an adaptive sampling strategy to encourage sampling of transitions between the bound and unbound states, while allowing a diverse set of associated states which may or may not lead to productive binding events. For every adaptive sampling round, we selected new starting points manually through visual inspection of representative conformational states identified in preliminary MSMs. We saved system coordinates every 0.2 ns, but strided into 1 ns steps for all subsequent analyses. We discarded the first nanosecond from each simulation as equilibration.
Markov modeling. We built an MSM using features aiming to resolve the internal structural rearrangements in ubiquitin associated with its association to the SH3c domain using PyEMMA 2.5.7 and MDTraj 1.9.3 [83][84][85] . Consequently, we selected a concise set of features, based upon available structural models of ubiquitin:SH3c complexes (PDB: 2K6D and 2JT4) 46,47 . We used two groups of features. The first group is composed of the shortest inter-residue distances between all residue pair combinations listed in Supplementary Table 4. We employed time-lagged independent component analysis 59,60 (TICA) to reduce the dimension of these distances to six using a lag-time of 100 ns. These six dimensions represent native interface contacts in experimental models (PDB: 2K6D and 2JT4), which we combined with the shortest distance between ubiquitin and the N-and C-termini of SH3c (first 14 and last 10 residues) to a seven-dimensional space. The latter distance helps to resolve non-productive binding events. We clustered these radial features into 450 states, using k-means clustering.
The second group of features is composed of the cosines and sines of backbone torsions of the C-terminus of ubiquitin (residue [70][71][72][73][74][75][76]. We employed TICA to reduce the dimension of these angular features to two using a lag-time of 50 ns. The first of these TICs is used to define a C-terminal mode, which undergoes a significant population shift during binding (Supplementary Methods). In the TICA analysis, we considered only unbound states with a ubiquitin and SH3c inter-chain distance of at least 1 nm. We clustered this 2D space into 12 cluster centers.
Initially, we assigned bound configurations to one of the 450 states defined by the radial features and unbound configurations to one of the 12 states represented by the angular features. This procedure led to a total of 462 states. To resolve the peptide-flip mode, we further split all states into two separate states if a cluster center contains both in and out configurations. This step brought us to 924 states. To prune out weakly connected states and attenuate errors associated with our coarse system representation, we filtered our discrete state trajectories using a lowpass filter lag-time of 90 ns (Supplementary Methods). We selected features and split Markov states based on previously determined important structural features for intrinsic dynamics of ubiquitin as well as ubiquitin:SH3c binding. Our model does not resolve any internal degrees of freedom of the SH3c domain, and as such, the model only represents the encounter dynamics from the ubiquitin perspective. Following these steps we arrive at our molecular dynamics data mapped on to 607 Markov states, which we used for the posterior sampling of 5000 Markov state models at lag time 62 ns following the Bayesian formalism previously described 86 . We chose the lag time based on the implied timescales and on the populations of the 20 most highly populated states in the MSM as a function of lag-time ( Supplementary Fig. 13). The implied timescales are the global relaxation timescales predicted by the MSM, and were computed via the Eigenvalues of the transition probability matrix 31 . Both implied timescales and state populations are stable (within model uncertainty) at the chosen lag time. A Chapman-Kolmogorov test ( Supplementary Fig. 13) shows that the model is consistent with the simulation data on timescales longer than the lag time 31,50 . Repeated runs of this procedure led to slight variations in the final number of states due to the stochastic nature of k-means clustering. The binding dissociation constant K d predicted by the MSM agrees with the experimentally determined value within the statistical uncertainty (Fig. 6).
To facilitate structural analysis, we coarse-grained the MSM by using Perron cluster-cluster analysis (PCCA) into 15 metastable states 87 . The unbound state is not Fig. 6 Binding kinetics in the MSM. a On-rate k on , off-rate k off , and dissociation constant K d of ubiquitin and SH3c calculated from the MSM for the state threshold value p Ã bind ¼ 0:57 at which the experimental value for K d is obtained. Unbound/bound states of the MSM are defined as states with p Ã bind values smaller/larger than the state threshold value. Super and sub-scripts indicate a 95% confidence interval of the posterior distribution of the MSM transition matrix. b K d , k on , and k off , as a function of the state threshold value p Ã bind . The threshold p Ã bind ¼ 0:57 in (a) is located within a plausible transition state region. However, the entire range of predicted K d values for different threshold choices is within the expected error of current state-of-the-art force field. The rates in subscript and superscript in (a) and the error regions in (b) represent 95% confidence intervals of the posterior distribution of Markov models. metastable on the model lag-time. We consequently separate the unbound states into a separate set of states manually, as all Markov states with an average ubiquitin-SH3c distance of more than 1nm. We then group the unbound Markov states into extended and compact C-terminal states. This leaves us with 17 states in total, however, in the main text we only visualize the states involved with high net flux 40 > 10 −7 for visual clarity (Fig. 3). All the metastable states have substantial conformational flexibility which impedes detailed structural analysis of the individual states. We report key properties of the of the 17 states in Supplementary Table 5.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The NMR data of this study and the mass spectrometry data for the synthesis of the G53(D)Thr ubiquitin protein are available in the open research data repository Edmond at https://doi.org/10.17617/3.AVKYZC 88 . The molecular dynamics data of this study are available in the Edmond data repository at https://doi.org/10.17617/3.8o 89 . The structure factor file and the atomic coordinates of the G53(D)T mutant of ubiquitin have been deposited in the Protein Data Bank under the accession code 7OOJ. Previously published structures of ubiquitin-SH3c complexes used for a comparison to molecular dynamics conformations are available in the Protein Data Bank under the accession codes 2K6D and 2JT4. Source data are provided with this paper.