## Introduction

Protein-ligand or protein-protein interactions underpin biological control mechanisms, and a detailed kinetic understanding of the interactions with atomic resolution is necessary to develop drug molecules. The role of molecular motion in these interactions is a very long-standing question, especially in the regime of fast kinetics1. Therefore, characterization of protein-ligand or protein-protein (protein-partner) interactions has been of interest for a long time, and the development of new experimental methods is crucial2,3,4,5,6,7,8,9,10,11,12,13,14. This line of research is critical to disentangle almost all molecular recognition in a cell15, including understanding the binding mechanism in terms of two-state binding vs. three-state binding via conformational selection or induced fit16,17,18.

In conformational selection19,20 and induced fit21, a conformational change occurs either prior to or after binding (Fig. 1). Prominent examples for induced fit include conformational changes from an open to a closed protein conformation after ligand binding22. Here, induced fit as binding mechanism can be directly deduced from protein structures if the entrance to the ligand-binding site is sterically blocked in the closed conformation of the bound form22. Other prominent examples for induced fit are protein systems with two bound forms of disordered fragments observed in nuclear magnetic resonance (NMR) experiments23,24. In the pKID/KIX system, the exchange between the free form and the bound forms is slow on the chemical shift timescale, which results in distinct peaks of these forms in NMR spectra23. Three-state fitting of NMR relaxation dispersion data and characteristic chemical shift changes during titration then directly evidence the existence of a binding mechanism with three states, whose structural identity can also be derived from chemical shift changes. Conformational selection in protein binding has been pioneered in NMR experiments that demonstrated conformational exchanges in the free protein form that are comparable to structural changes between the free and bound forms7,25. Relaxation-dispersion NMR methods to characterize low-populated conformations in free protein forms have been recently extended26 using paramagnetically induced pseudocontact shifts to increase the chemical shift range between different conformations27. But as a binding mechanism, conformational selection requires the additional kinetic proof that excited states observed e.g. in NMR experiments of the free form are on-pathway in the binding reaction. For protein binding reactions with relaxation times of milliseconds to seconds, such a kinetic proof can be provided by stopped-flow mixing experiments18,28,29. However, a general approach to investigate protein binding mechanisms is missing on sub-millisecond time scales where stopped flow is too slow or where the exchange between the free and bound protein forms is fast on the NMR chemical shift timescale under all stoichiometric conditions.

Here, we report the development of a theoretical and experimental framework for investigating protein-partner interaction with recognition kinetics down to tens of microseconds with atomistic detail. The framework can be applied to any binding regime but is particularly insightful for weak, transient binding with off-rates koff larger than 1000 s−1. The kinetics of ligand binding are measured using high-power relaxation dispersion experiments, which have been shown to reveal kinetics down to single digit microseconds in individual proteins30. High-power relaxation dispersion is uniquely advantageous for measurement of both slow (<1000 s−1) and fast (up to ~ 37,000 s−1) kinetics in comparison to R1ρ or off-resonance R1ρ experiments regarding both experimental setup and data analysis30. We apply our framework to analyze the binding of the paradigmatic protein ubiquitin7 to its partner protein, the SH3c domain of CIN85. The interaction of ubiquitin and the SH3c domain is weak and transient (with dissociation constant Kd = 370 ± 15 μM from NMR titrations), akin to many other biologically important interactions. We show with the measurement of concentration-dependent kinetics using relaxation dispersion in the fast-exchange regime (Supplementary Fig. 1) that three-state binding via conformational selection dominates the kinetics of the binding on the side of ubiquitin. For the partner protein SH3c, we find consistence with two-state binding, in agreement with three-state conformational selection on the ubiquitin side. This concentration-dependent relaxation dispersion measurement and fitting procedure constitutes a litmus test for the recognition mechanism. In a subsequent step, we use molecular dynamics simulations and Markov state modeling31,32,33,34,35,36,37,38,39,40,41 to identify the ubiquitin conformation selected for binding. This binding-competent ubiquitin conformation exhibits a characteristically extended C-terminus.

Ubiquitin is a hub of the cellular interaction network. At the same time, CIN85 is an adapter molecule that controls the spatial and temporal assembly of multi-protein complexes by its three SH3 domains that bind other partners42. Thus, the interaction of ubiquitin with the third SH3 domain of CIN85 (SH3c) is the natural choice for testing our framework. Besides, given that the precise interaction of ubiquitin with many partners is a hallmark of cell function, it makes one wonder if there is a conformational selection mechanism in ubiquitin, where a minor “binding-compatible” conformation binds the partner specifically. Previous work has shown that free ubiquitin consists of an ensemble of conformations, including the “bound-like” conformations seen in ubiquitin complexes, thus supporting conformational selection as the binding mechanism7. Subsequent experimental and computational work has delineated the dynamic modes of ubiquitin in granular details43. Here we show how our expandable theoretical and experimental framework brings together the different internal dynamics and explains this paradigmatic protein-partner interaction.

## Results and discussion

### Measuring ligand-concentration dependent high-power relaxation dispersion enables distinguishing between binding mechanisms

We characterized both proteins starting with ubiquitin since we had previously demonstrated that binding-competent conformations exist in the ubiquitin ensemble in the absence of binding partners7. The presence of binding-competent conformations is a necessary but not sufficient condition for conformational selection as the question is essentially about binding kinetics16,44. We previously determined fast conformational transitions with exchange rates kex larger than about 20,000 s−1 in free ubiquitin using relaxation dispersion43. NMR titration indicates also fast exchange between the free and bound forms of ubiquitin and SH3c, because we observed only one cross peak for all ratios of ubiquitin and SH3c (Supplementary Fig. 1), and because the intensity of this peak decreased monotonously with increasing formation of the complex (Supplementary Fig. 1e, i). Thus, we determined the concentration-dependent exchange rate kex of the complex formation at different partner concentrations as in our previous experiments on free ubiquitin by fitting the relaxation rates R2,eff with the fast-exchange Luz-Meiboom equation45 (Fig. 2a, b and “Methods”). To gain insight on the binding mechanism from these experimentally determined kex values, we have developed analytical equations for the variation of kex with varying partner concentration, for two-state binding without (kinetically) relevant conformational change during binding, and for three-state binding with a conformational change prior to the binding step (conformational selection), or after binding (induced fit) (Fig. 1). These equations for kex hold at all concentrations of the proteins, in contrast to related equations for the dominant relaxation rate kobs of stopped-flow mixing experiments derived under the ‘pseudo-first assumption’ of an excess concentration of one of the binding partners (Supplementary Methods). The analytical equations guided us in selecting conditions for the measurement of kinetic parameters for the ubiquitin-SH3c system, starting with a ratio of 1 (SH3c) to 50 (ubiquitin) and increasing the concentration of SH3c to 1:1. This wide sub-stoichiometric range of ubiquitin-SH3c ratios allows to identify the slope and curvature of kex as a function of the SH3c concentration [L]0 and to determine the unbinding rate koff in the limit [L]0 to 0 (Fig. 1). The measurements reveal 22 ubiquitin residue positions with kex values that are clearly smaller than in free ubiquitin and, thus, reflect the exchange between the SH3c-bound and unbound state of ubiquitin (Supplementary Methods, Supplementary Table 2, and Supplementary Fig. 2). The exchange rate kex decreases with increasing total concentration of the binding partner SH3c at the large majority of the 22 residue positions (Fig. 2c and Supplementary Fig. 3), which signifies conformational selection and excludes two-state binding and induced fit (Fig. 1). The rate parameters for the conformational-selection model obtained from fitting of the concentration-dependent kex data at the 22 residues positions are overall consistent (Fig. 2e, f) and support conformational selection of a low-populated, excited ubiquitin conformation prior to binding to SH3c. Weighted averaging of the fitted rate parameters leads to the conformational excitation rate k12 = 1280 ± 170 s−1 and to the unbinding rate koff = k = 2420 ± 140 s−1 (dashed blue lines in Fig. 2e, f). The population k12/(k12 + k21) of the excited unbound ubiquitin conformation is not larger than about 6.5% because the rate k12 + k21 for the conformational exchange in free ubiquitin is not smaller than about 20,000 s−1 according to previous measurements43.

Measurements on the side of SH3c reveal 12 residue positions with kex values affected by ubiquitin as binding partner (Supplementary Methods, Supplementary Table 3, and Supplementary Fig. 2). The kex curves at these 12 residue positions are consistent with two-state binding, in agreement with a conformational-selection three-state binding mechanism for ubiquitin, in which SH3c has two states (Fig. 2d and Supplementary Fig. 4). Weighted averaging of the single fit parameter koff leads to koff = 1.43 ± 0.04 ms−1 (or 1430 ± 40 s−1), which is close to the unbinding rate koff obtained from the fits of the conformational-selection model on the side of ubiquitin.

### Markov modeling identifies a ubiquitin C-terminal mode as conformational-selection mode

To identify the ubiquitin conformation selected for binding, we carried out approximately 1.68 ms of molecular dynamics simulations and used these to build a Markov state model (MSM)31,32 that describes the conformational dynamics during binding as a kinetic network of metastable states (see Methods). The most stable state of the MSM is a structurally diverse, bound state that encompasses two published ubiquitin:SH3c models (PDB 2K6D and 2JT4)46,47. A comparison to previously reported distances derived from paramagnetic relaxation enhancement (PRE) measurements46 indicates that this bound state of our MSM recapitulates the experimental bound state well (Supplementary Fig. 5). Based on this bound state and an unbound state in which the distance of ubiquitin and SH3c is larger than 1 nm, we employ transition path theory48,49,50 to compute a committor probability, or binding probability, pbind that quantifies the progress along the binding transition paths of the MSM (see “Methods”). We use adaptive sampling to access intermediate and unbound states with pbind < 141. Overall, the MSM resolves the reversible binding process of ubiquitin and SH3c in atomic detail and predicts a dissociation constant of binding that agrees with the experimental value within the statistical uncertainty (“Methods”). Markov state modeling and molecular dynamics simulations have been previously used to investigate the conformational changes of proteins during binding to small ligands38,39,51,52,53 and the binding-induced folding of disordered peptides40,54,55,56,57.

A peptide-flip motion between “in” and “out” conformations of ubiquitin emerged as a slow motion in earlier work43 and, thus, as possible candidate of a conformational-selection mode. However, the previously described mutant G53A that locks ubiquitin almost fully into the “out” conformation along the peptide-flip motion and the novel G53(D)T mutant (chemically synthesized with (D)-Threonine at position 53 and E24 15N labeled as a reporter) that locks ubiquitin almost fully into the “in” conformation (see Methods) do not have a large effect on the dissociation constant Kd of ubiquitin and SH3c, with $${K}_{{{{{\rm{d}}}}}}^{{{{{{{{\rm{G53(D)T}}}}}}}}}=374\pm 48$$ μM, $${K}_{{{{{\rm{d}}}}}}^{{{{{{{{\rm{G53A}}}}}}}}}=537\pm 28$$ μM (Supplementary Fig. 6). These observations suggest that ubiquitin can bind SH3c in both the “in” and “out” conformation, and that the population-shift of the peptide-flip motion during binding to SH3c is rather small. The population shift can be calculated from the ratio of the dissociation constants for the “in” and “out” conformations and, thus, from the ratio of $${K}_{{{{{\rm{d}}}}}}^{{{{{{{{\rm{G53(D)T}}}}}}}}}$$ and $${K}_{{{{{\rm{d}}}}}}^{{{{{{{{\rm{G53A}}}}}}}}}$$ (Supplementary Methods). The binding-induced population shift of the peptide-flip motion in our Markov state model is also small, in agreement with the mutational data. As in previous molecular dynamics simulations43, the peptide-flip motion in our simulations is accelerated compared to the experiments. Similar to the peptide flip, the population shift of the pincer mode of ubiquitin58 during binding to SH3c is rather small in the MSM (Supplementary Methods).

Besides the peptide-flip mode, an independent and similarly slow motion in our simulations and Markov modeling involves the flexible C-terminal tail of ubiquitin. In free ubiquitin, we observe two distinct compact and extended conformations of the C-terminal tail, which we define via time-lagged independent component analysis59,60 of the C-terminal backbone torsion angles of ubiquitin, considering only the unbound states with pbind = 0 (“Methods”). Our Markov model constructed from 1.68 ms of binding simulations indicates that the population of the compact C-terminal conformation is strongly reduced during binding, and that this population reduction occurs prior to the transition state of binding, which is a clear signature of conformational selection. Fig. 3a illustrates the reactive flux between the dominant coarse-grained states of our MSM in binding direction. Along the binding pathways, the population of the compact C-terminal conformation diminishes from 21% (confidence interval (CI): 17–25%) for the compact, unbound state P1 to 2.4% (CI: 1.7–3.4%) in the transition-state ensemble, which is composed of the states A, B, and C with intermediate binding probability 0.45 < pbind < 0.75, and remains low in the bound state F with a population value of 5.8% (CI: 4.2–7.8%). The vanishing population of the compact conformation in the transition state implies that productive binding events, across the transition state, are not possible in this conformation. Unlike the extended C-terminal conformation, the compact conformation sterically obstructs binding of SH3c to ubiquitin (Fig. 4). Consequently, the extended conformation of the C-terminus likely is the sought-after ubiquitin conformation selected for binding. Based on our fits of the kex data, we expect a more drastic shift in populations for the conformational-selection mode, i.e., a larger population of the compact C-terminal conformation in the unbound state. However, the discrepancy we observe between experiment and modeling is within systematic errors in state-of-the-art molecular dynamics force-fields61,62 that were the basis of the MSM. Relative populations of alternative conformations are notoriously difficult to estimate from molecular dynamics simulations, because systematic errors of few kJ mol−1 can lead to large deviations in populations.

In summary, we introduce a litmus-test-like theoretical and experimental framework to identify conformational selection of transiently binding proteins on sub-millisecond timescales that are beyond the reach of standard stopped-flow mixing experiments or NMR methods relying on intermediate or slow exchange between bound and unbound protein forms. Our framework extends the time resolution in protein binding experiments in a way that is comparable to the timescale extension provided by temperature-jump experiments of protein folding relative to stopped-flow mixing experiments63,64. We expect that this framework will be applicable for many transient complexes. For the paradigmatic ubiquitin-SH3c complex, we identify conformational selection of ubiquitin, which agrees with the two-state recognition mechanism observed for the binding partner SH3c. In a complementary computational approach that involves molecular dynamics simulations and Markov modeling, we find that the ubiquitin conformation selected for binding exhibits a characteristically extended C-terminus. This framework makes future explorations possible to test the hypothesis that hub proteins such as ubiquitin utilize conformational selection as an evolutionary mechanism to be more adaptable.

## Methods

### Expression, purification and NMR sample preparation of the SH3c domain of human CIN85

The 15N-labeled SH3c domain of hCIN85 was recombinantly produced in Toronto minimal medium with 15N-NH4Cl (Sigma Aldrich) as nitrogen source according to a published protocol42. Briefly, a fragment of CIN85 comprising amino acids 263-333 was expressed in the bacterial strain BL21(DE3)Star (Invitrogen) as fusion protein with N-terminal His7-tag. After purification on a Ni-NTA ProtinoTM metal affinity column (Macherey-Nagel, Germany) the His7-tag was cleaved off with TEV-protease and removed by a second Ni-NTA ProtinoTM column purification step. The SH3c (this construct is referred to as SH3c in the main manuscript) domain was eluted in the flow-through and further purified by gel-filtration on a Superdex 75/16-60 column (GE Healthcare). The sample was dialyzed against NMR buffer (20 mM sodium phosphate, pH 6.5, 100 mM NaCl, 10 mM TCEP, 0.05% (w/v) NaN3) and the final concentration was adjusted to 2 mM.

### Chemical synthesis, folding, purification and NMR sample preparation of 15N-Glu24-labeled D-Thr53-ubiquitin

Synthetic 15N-Glu24-labeled D-Thr53-ubiquitin was produced by Fmoc protection-based65 linear solid-phase peptide synthesis (SPPS) with an automated microwave synthesizer (Liberty 1, CEM), similar to a published protocol for high-yield synthesis of ubiquitin66 (0.1 mM scale, fivefold excess of amino acid for coupling, capping was done with 20% acetic acid anhydride). Briefly, synthesis was performed on an Fmoc-Gly preloaded Wang resin67 (Novabiochem). Couplings of the protected amino acids (Novabiochem) were performed with HBTU/HOBT/DIEA reagent mix68 (Merck), except for 15N-labeled Fmoc-Glu(OtBu)-OH (Sigma Aldrich) that was coupled overnight at position 24 using HATU/DIEA reagent mix69 (Merck). From position 24 onward only half the resin was reacted. From residue 52 onward no microwave irradiation was used to prevent aspartimide formation. The Fmoc-D-Thr-OH amino acid was incorporated at position 53. After deprotection with 20% piperidine and cleavage from the resin and lyophilization, the raw peptide (220 mg) was dissolved at 10 mg/ml in DMSO and refolded at room temperature by dropwise dilution into buffer A (50 mM acetic acid, pH 4.5) to a final DMSO concentration of 2% (v/v). The refolded protein solution was sequentially purified on a 5 ml HiPrep SP XL cation exchange column (GE Healthcare) and on a Mono S HR5-5 cation exchange column (GE Healthcare). The correct mass of the purified protein was verified by LC-MS (column: XSelect Peptide CSH C18 XP column, 2.5 μm, 4.6 × 100 mm, 130 Å, Waters; Acquity Arc System, Waters with SQD2-Mass-Detector (Single Quadrupol)) after each column purification step. All fractions from both columns were investigated by LC–MS (molecular weight measured: 8612 Da, expected: 8610 Da) and the purest fractions were pooled. For preparing the NMR sample the protein was dialyzed overnight against 20 mM sodium phosphate, pH 6.5, 100 m NaCl. After addition of 10 mM TCEP, 10% D2O (v/v) and 0.05% NaN3 (w/v), the final protein concentration was adjusted to 2 mM. The volume of the NMR sample following the dialysis and concentration steps was 350 μL (5.95 mg in total). The spectra showed that the protein is folded (Fig. 5). All other chemicals were purchased from Merck, Sigma, Alfa Aesar and Multisyntech.

### Crystallization of D-Thr53-ubiquitin, data collection and structure determination

Protein from the NMR sample was also used for crystallization. Crystals were obtained at 20 °C by sitting drop vapor diffusion mixing 100 nL of protein solution with 100 nL of well solution (0.1 M Bis-Tris, pH 7.5, 20% polyethylene glycol monomethyl ether 2000, 50 mM CdCl2). For data collection crystals were soaked for 1 min in well buffer supplemented with 20% (v/v) glycerol. Data collection was performed at SLS Villigen, Switzerland (beamline PXII, Pilatus 6M detector). Data were processed with XDS70. Space group determination and statistical analysis (Supplementary Table 1) was performed with XPREP (Bruker AXS, Madison, Wisconsin, USA). The structure was solved at a resolution of 2.6 Å by molecular replacement with PHASER71 using the crystal structure of ubiquitin (PDB code: 1UBQ)72 as search model. Refinement (Supplementary Table 1) was performed with Refmac573 alternating with manual model building in Coot74. The crystal structure has the same overall fold as the wild-type ubiquitin except the region near residue position 53 (Fig. 5).75,76. All mutants and wild-type ubiquitin were isotopically labeled, expressed and purified as described77.

### Kinetics of interconversion of free wt, G53A, and G53(D)T mutant

HSQC spectra were recorded for wt ubiquitin and its G53A and G53(D)T mutants. The chemical shifts of E24 amide are vastly different, by 1.981 ppm in the proton dimension due to the different chemical environment of the two mutants (Fig. 5). The wild-type ubiquitin is a mixture of the “in” and “out” conformations of the peptide-flip motion, the chemical shift of the proton and nitrogen being more or less in the middle. The wild-type resonance measured at 400 MHz is visible only at 308 K. At 277, it is exchange broadened beyond detection. The HSQC of G53(D)T was measured at 308K in a 900 MHz spectrometer. The HSQC of the G53A mutant at 308 K was measured at 700 MHz spectrometer. To calculate the populations of the “in” and “out” conformation in the wt, we used the weighted average of the proton and nitrogen chemical shifts of E24 of the two mutants. The error in the position of the weakest peak E24 in wt was calculated as 1/2linewidth/signal-to-noise. The error in peak position was propagated to obtain the population error of ±6%. The E24 in the “in” and “out” peptide-flip G53A and G53(D)T mutants are visible at 277 K, indicating that there is no exchange for the mutants indicating that they are locked in the “in” or “out” conformations, respectively. Both the G53A and G53(D)T mutants, corresponding to the “out” and “in” conformations are measured in 800 MHz spectrometer.

### High-power relaxation dispersion of ubiquitin (with SH3c titrated in)

The high-power relaxation experiments were measured using the 15N based constant time E-CPMG experiment30 for quantifying micro-to-millisecond time-scale exchange process in ubiquitin in Bruker Avance 600 MHz and 950 MHz spectrometers fitted with cryoprobe-TCI (Neo console running Topspin 4.× (Bruker Biospin corporation)) at 277 K. The constant time (CT) CPMG delay is divided into two equal halves, sandwiching the U-element that ensures the equal contribution of anti-phase and in-phase relaxation to R2,eff in all frequencies. In the E-CPMG experiment performed here, refocusing pulses are applied with strong γB1/2π (7143 Hz and 7407 Hz for 15N in the 950 MHz and 600 MHz spectrometers respectively) fields (corresponding to 15N hard pulses) for all refocusing frequencies, thus reducing any off-resonance effects that can affect the measurement of R2,eff30. The R2,eff values were measured at CPMG frequencies (νCPMG) of 66.7, 133, 267, 400, 533, 667, 1333, 2000, 2667, 3333, 4000, 4667, 5333, and 6000 Hz (Supplementary Figs. 712).

The constant volume of 200 μL of NMR samples was put inside 3 mm tubes (Hilgenberg GmbH) in 20 mM sodium phosphate buffer, pH 6.5, containing 100 mM NaCl, 10mM TCEP, 0.05% (w/v) sodium azide, and 10% D2O. In all experiments the ubiquitin (15N labeled) concentration was 1 mM. The SH3c (unlabeled) concentration was varied from 0, 0.02, 0.05, 0.1, 0.25, 0.5 mM up to 1 mM. The probe temperature was calibrated using a digital thermometer and standard methanol sample.

The reference spectra were collected without the CPMG delay period (τ). The R2,eff was calculated as

$${R}_{2,{{{{{{{\rm{eff}}}}}}}}}({\nu }_{{{{{{{{\rm{CPMG}}}}}}}}})=-1/T\log \left(I({\nu }_{{{{{{{{\rm{CPMG}}}}}}}}})/{I}_{0}\right)$$
(1)

where νCPMG is the effective frequency of the CPMG field, (νCPMG = 1/(4τ), where the time between the centers of consecutive 180 pulses is 2τ), T is the constant delay during which CPMG pulses were applied (60 ms), I0 is the intensity of the peak in reference experiment and I(ν) is the intensity of the peak at that particular CPMG frequency. The CPMG delay (60 ms) was chosen such that the residual intensity was approximately 50% of maximum intensity. The experiment was performed with 3 s recycle delay between increments using 12 different refocusing field strengths between 0 and 6000 Hz collected in scrambled and interleaved manner with 1024 (1H) and 130 (15N) complex points, respectively. For each increment, 16 transients were measured following the Echo-AntiEcho scheme for signal averaging. There is a heat compensation block in the middle of the recycle delay to dump the extra CPMG cycles so that the total number of CPMG 180 refocusing pulses at fixed B1 field strength is identical during the individual scans. The E-CPMG experiments took 3 days to complete, and standard 1H, 15N TROSY-HSQC spectra were collected before and after each experiment to monitor sample stability. A set of 5 non-exchanging residues were identified based on the criteria of lowest standard deviation between the R2,eff values. The global uncertainty for the experimental data was calculated as the average of the standard deviations of the set of 5 residues29. The residue-specific uncertainties were calculated from measuring the deviation between R2,eff values in repeat measurements at a suitable frequency (667 Hz). The largest of the global or residue-specific uncertainties is reported.

### High-power relaxation dispersion of SH3c (with ubiquitin titrated in)

The high-power relaxation dispersion on the 15N labeled SH3c were measured at 277 K in Bruker Avance-III 800 MHz spectrometer equipped with cryoprobe-TCI. The refocusing pulses were applied with γB1/2π ~ 5 kHz for 15N in an interleaved manner with 3 s recovery delay. The spectra were recorded with 1024 and 156 complex points in the direct and indirect dimensions, respectively. The NMR experiments were performed with the 15N-labeled CIN85-SH3 and unlabeled ubiquitin complex in 20 mM sodium phosphate buffer, pH 6.5, containing 100 mM NaCl, 10mM TCEP, 0.05% (w/v) sodium azide, and 10% D2O. In this experiment the SH3c (15N labeled) concentration was kept fixed at 1 mM, and the ubiquitin (unlabeled) concentration was varied from 0, 0.02, 0.05, 0.075, 0.1, 0.15, 0.25, 0.5 mM, up to 1 mM. The R2,eff values were measured at the same frequencies as the previous experiment.

### Fast exchange of free and bound forms

Linear shifts of peaks in HSQC spectra upon titration indicate fast exchange of free and bound forms (Supplementary Fig. 1). Supplementary Fig. 1a shows a series of HSQC spectra upon titration of ubiquitin with SH3c, and Supplementary Fig. 1b shows the same for SH3c titrated with ubiquitin. The ratios between the two proteins ranged from 0 to 80% in the first case and from 0 to 78% in the second case. The amount of the bound complex was limited by the solubility of the proteins. The peaks shift linearly without indication of a third state. The linewidths both in the proton and nitrogen dimension increase with increasing concentration of the other component according to the increase of the effective molecular weight. No line broadening due to intermediate exchange is seen.

### Fitting of relaxation dispersion data with a two-state exchange model

We fitted the relaxation rates R2,eff with the fast-exchange formula2,45

$${R}_{2,{{{{{{{\rm{eff}}}}}}}}}={R}_{2,0}({B}_{0})+\frac{{\psi }_{{{{{{{{\rm{ex}}}}}}}}}\,{{{{{{{{\rm{{B}}}}}}}_{0}}}}^{2}}{{k}_{{{{{{{{\rm{ex}}}}}}}}}}\left(1-\frac{4\nu }{{k}_{{{{{{{{\rm{ex}}}}}}}}}}\tanh \left[\frac{{k}_{{{{{{{{\rm{ex}}}}}}}}}}{4\nu }\right]\right)$$
(2)

On the ubiquitin side, the two NMR data sets for R2,eff as a function of ν = 1/(4τ) at the two 15N resonance frequencies 60.795 MHz and 96.313 MHz (blue and yellow data points in Supplementary Figs. 7 to 12, respectively) were jointly fitted using the four fit parameters R2,0(60.795 MHz), R2,0(96.313 MHz), ψex, and kex. The fit results for the two-state exchange rate kex at the different SH3c concentrations and ubiquitin residue positions are shown in Supplementary Table 2. We used the function NonlinearModelFit of Mathematica 11.378 in these fits. The errors ΔR2,eff of the data points were included as weights $$1/{({{\Delta }}{R}_{2,{{{{{{{\rm{eff}}}}}}}}})}^{2}$$ in the fitting, and the errors of the fit parameters were estimated from the fit residuals with the standard variance estimator function of NonlinearModelFit. Because of the typically smaller errors of the blue data points obtained at the 15N resonance frequency 60.795 MHz, the joint fits of the data at both resonance frequencies tend to be more faithful to these blue data, compared to the yellow data points obtained at the 15N resonance frequency 96.313 MHz (Supplementary Figs. 712).

On the SH3c side, the NMR data for R2,eff as a function of ν at the 15N resonance frequency of 81.1 MHz were fitted with the three fit parameters R2,0(81.1 MHz), ψex, and kex. The fit results for kex at the different ubiquitin concentrations and SH3c residue positions are shown in Supplementary Table 3. The errors were estimated from the fit residuals with the standard variance estimator function of NonlinearModelFit of Mathematica 11.3.

### Molecular dynamics simulations of ubiquitin-SH3c binding

We adopted the coordinates from the complex (PDB-file 2K6D) as a starting point to generate the topology for our simulation system. Several N- and C-terminal residues were missing in the SH3c chain when compared to the experimental construct. Consequently, amino acids GHMDSRT and DFEKE were added respectively to the N- and C-termini of the SH3c chain, using PyMOL. We performed all equilibration simulations using GROMACS 5.1.479. We separated the ubiquitin and SH3c chains into independent simulation systems. These chains were independently solvated; we added Na+ and Cl ions to neutralize the simulation box, which was then energy minimized and equilibrated in the NpT ensemble for 100 ps. Finally, we equilibrated for five nanoseconds in the NVT ensemble at 330K with the Amber99SB-ILDN forcefield80. We used an integration time-step of 2 fs, kept the simulation box temperature using the Bussi-thermostat81, and treated long-range electrostatics using the Particle Mesh Ewald method. In the simulations of ubiquitin, we used a cubic box with side-length 6.55 nn that contained 8863 TIP3P water molecules, and protonated His68 at Nϵ. In the simualtions of SH3c, we used a cubic box with side-length 6.53 nm that contained 9084 TIP3P water molecules, 6 Na+ ions, and protonated His2 at Nϵ. Using PyMOL, we extract ten random configurations of the protein chains from the ubiquitin and SH3c equilibration simulations. We paired the ubiquitin and SH3c configurations together randomly, without replacement. Each pair of structures were placed randomly (non-overlapping) in a cubic box of side-length 10.0 nm.

Using GROMACS 5.1.4, we solvated each of ten starting orientations of ubiquitin and SH3c in 31,817 TIP3P water molecules, adding 66 Na+ and 60 Cl ions to a final concentration of 100 mM NaCl. The total system size is 98,995 atoms. We use the Amber99SB-ILDN forcefield, to energy minimize the simulation box, followed by equilibration in the NpT ensemble for 100 ps to a final box size of 10.0 nm3. We export the final system coordinates for the initialization of production simulations on graphics processing units (GPUs) in OpenMM 7.582.

In our production simulations82, we used hydrogen-mass repartitioning with heavy protons (4 amu) and constrained all covalent bonds to enable a 4 fs integration time step. We used the Amber99SB-ILDN forcefield for the protein chains, and TIP3P for the water molecules. The Particle mesh Ewald method was used to treat electrostatic interactions beyond 0.9 nm. We integrated the system using a Langevin integrator with a friction constant of 1 ps−1 and thermostatting to 300 K. We performed 200 ps equilibration simulations of each of the starting configurations in the NVT ensemble, and observed no energy or temperature drift after a few ps.

We ran 1015 simulations in total, across five adaptive rounds, with approximately 200 concurrent simulations per round. The longest simulations were 4 μs, and 50% of all simulations were between 811 ns and 2 μs. We used an adaptive sampling strategy to encourage sampling of transitions between the bound and unbound states, while allowing a diverse set of associated states which may or may not lead to productive binding events. For every adaptive sampling round, we selected new starting points manually through visual inspection of representative conformational states identified in preliminary MSMs. We saved system coordinates every 0.2 ns, but strided into 1 ns steps for all subsequent analyses. We discarded the first nanosecond from each simulation as equilibration.

### Markov modeling

We built an MSM using features aiming to resolve the internal structural rearrangements in ubiquitin associated with its association to the SH3c domain using PyEMMA 2.5.7 and MDTraj 1.9.383,84,85. Consequently, we selected a concise set of features, based upon available structural models of ubiquitin:SH3c complexes (PDB: 2K6D and 2JT4)46,47. We used two groups of features. The first group is composed of the shortest inter-residue distances between all residue pair combinations listed in Supplementary Table 4. We employed time-lagged independent component analysis59,60 (TICA) to reduce the dimension of these distances to six using a lag-time of 100 ns. These six dimensions represent native interface contacts in experimental models (PDB: 2K6D and 2JT4), which we combined with the shortest distance between ubiquitin and the N- and C-termini of SH3c (first 14 and last 10 residues) to a seven-dimensional space. The latter distance helps to resolve non-productive binding events. We clustered these radial features into 450 states, using k-means clustering.

The second group of features is composed of the cosines and sines of backbone torsions of the C-terminus of ubiquitin (residue 70-76). We employed TICA to reduce the dimension of these angular features to two using a lag-time of 50 ns. The first of these TICs is used to define a C-terminal mode, which undergoes a significant population shift during binding (Supplementary Methods). In the TICA analysis, we considered only unbound states with a ubiquitin and SH3c inter-chain distance of at least 1 nm. We clustered this 2D space into 12 cluster centers.

Initially, we assigned bound configurations to one of the 450 states defined by the radial features and unbound configurations to one of the 12 states represented by the angular features. This procedure led to a total of 462 states. To resolve the peptide-flip mode, we further split all states into two separate states if a cluster center contains both in and out configurations. This step brought us to 924 states. To prune out weakly connected states and attenuate errors associated with our coarse system representation, we filtered our discrete state trajectories using a low-pass filter lag-time of 90 ns (Supplementary Methods). We selected features and split Markov states based on previously determined important structural features for intrinsic dynamics of ubiquitin as well as ubiquitin:SH3c binding. Our model does not resolve any internal degrees of freedom of the SH3c domain, and as such, the model only represents the encounter dynamics from the ubiquitin perspective. Following these steps we arrive at our molecular dynamics data mapped on to 607 Markov states, which we used for the posterior sampling of 5000 Markov state models at lag time 62 ns following the Bayesian formalism previously described86. We chose the lag time based on the implied timescales and on the populations of the 20 most highly populated states in the MSM as a function of lag-time (Supplementary Fig. 13). The implied timescales are the global relaxation timescales predicted by the MSM, and were computed via the Eigenvalues of the transition probability matrix31. Both implied timescales and state populations are stable (within model uncertainty) at the chosen lag time. A Chapman–Kolmogorov test (Supplementary Fig. 13) shows that the model is consistent with the simulation data on timescales longer than the lag time31,50. Repeated runs of this procedure led to slight variations in the final number of states due to the stochastic nature of k-means clustering. The binding dissociation constant Kd predicted by the MSM agrees with the experimentally determined value within the statistical uncertainty (Fig. 6).

To facilitate structural analysis, we coarse-grained the MSM by using Perron cluster–cluster analysis (PCCA) into 15 metastable states87. The unbound state is not metastable on the model lag-time. We consequently separate the unbound states into a separate set of states manually, as all Markov states with an average ubiquitin-SH3c distance of more than 1nm. We then group the unbound Markov states into extended and compact C-terminal states. This leaves us with 17 states in total, however, in the main text we only visualize the states involved with high net flux40 > 10−7 for visual clarity (Fig. 3). All the metastable states have substantial conformational flexibility which impedes detailed structural analysis of the individual states. We report key properties of the of the 17 states in Supplementary Table 5.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.