Tandem domain structure determination based on a systematic enumeration of conformations

Protein structure determination is undergoing a change of perspective due to the larger importance taken in biology by the disordered regions of biomolecules. In such cases, the convergence criterion is more difficult to set up and the size of the conformational space is a obstacle to exhaustive exploration. A pipeline is proposed here to exhaustively sample protein conformations using backbone angle limits obtained by nuclear magnetic resonance (NMR), and then to determine the populations of conformations. The pipeline is applied to a tandem domain of the protein whirlin. An original approach, derived from a reformulation of the Distance Geometry Problem is used to enumerate the conformations of the linker connecting the two domains. Specifically designed procedure then permit to assemble the domains to the linker conformations and to optimize the tandem domain conformations with respect to two sets of NMR measurements: residual dipolar couplings and paramagnetic resonance enhancements. The relative populations of optimized conformations are finally determined by fitting small angle X-ray scattering (SAXS) data. The most populated conformation of the tandem domain is a semi-closed one, fully closed and more extended conformations being in minority, in agreement with previous observations. The SAXS and NMR data show different influences on the determination of populations.

to the experiments, and scattering vectors q were ranging from 0.119 to 4.007 nm −1 . More information is given in the section "Method details" in Delhommel et al. 1 2 Molecular dynamics refinement in implicit solvent Molecular dynamics (MD) trajectories were used to relax several systems during the procedure of assembling domains P1 and P2 to Lnk. The MD trajectories were recorded using NAMD 2.13. 2 Topology parameters were taken from the force fields c36 3 and c36m. 4 The simulations were performed in Generalized Born implicit solvent (GBIS) 5 at a temperature of 300 K. A ion concentration of 0.3M, and a cutoff of 12Å for calculating Born radius were used. A cutoff of 14Å and a switching distance of 13Å were defined for non-bonded interactions. The RATTLE algorithm 6,7 was used to keep all covalent bonds involving hydrogens rigid, enabling a time step of 2 fs. Temperature was regulated according to a Langevin thermostat. 8 At the beginning of each trajectory, the system was first minimized for 1,000 steps, then heated up gradually from 0 K to 300 K in 30,000 integration steps. Finally, the system was equilibrated for 5,000 steps. During all steps, from minimization to production, positional restraints were applied on specific regions of the system with a constant force of 1 kcal/mol. A production run of 100ps was then performed and the conformation of the final frame was saved as the relaxed conformation.

Optimization of P1LnkP2 conformations along the RDCs
The 33 selected conformations of P1LnkP2 displaying a R correlation factor smaller than 0.7 between experimental RDCs and RDCs calculated by PALES 2.1 9 were optimized along RDCs using XPLOR-NIH 3.1 10 script refine.py. This script, available as example in the XPLOR-NIH package, implements a slow cooling protocol in torsion angle space. The high temperature of this protocol was modified to be set to 300K, as only a simple optimization of the structure to fit the RDC measurements was sought. The final temperature of the system was 25K. The RDC refinement was performed using the residual dipolar couplings (RDC) measured previously 1 on the NH groups of residues 137-222 (P1) and 281-374 (P2), using the axial and rhomboic components determined by PALES on each initial P1LnkP2 conformation. Restraints on φ and ψ backbone dihedral angles were applied on residues of P1 and P2 detected by STRIDE 11 as α-helix or β-strand in the starting P1LnkP2 conformation.
The target values of the φ, ψ restraints were the values observed in the starting P1LnkP2 conformation, and a interval of 20 • was allowed around the target value. Twenty P1LnkP2 conformations were produced by the refinement procedure, from which the conformation displaying the smallest total energy was selected.

Optimization of P1LnkP2 conformations along the PREs
The refinements of the 10 conformations (13,20,55,145,146,150,153,160,174,186) displaying a distance between geometric centers of P1 and P2 smaller than 32Å, with respect to the measured paramagnetic enhancements (PRE) were performed with the same script refine.py and conditions than the ones previously used for RDC refinement. In addition to RDCs and φ, ψ restraints, the volume of P1LnkP2 was restrained to 3500Å 3 using the potential gyrPot. 12

Determination of the P1LnkP2 populations
The relative populations of the 83 P1LnkP2 conformations were determined by fitting the SAXS curve previously measured. 1 Two software were used in parallel: BioEn 0.1.1, 13 based on optimization in a Bayesian frame, and Mesmer 1.0.0, 14 based on a genetic algorithm. On each considered conformation, theoretical SAXS curves were calculated using CRYSOL 15 available in the package ATSAS 3.0.3 16 with 857 points, a maximum scattering vector of 4 nm −1 and a maximum order of harmonics of 18. A 1D cubic interpolation 17 was used to obtained the theoretical SAXS values at the same sets of scattering vectors q than the ones at which the experimental SAXS curve was recorded.
The processing with BioEn was performed in the following way. The optimization was run for 1000 steps using the GSL library. 18 Ten runs were performed independently on all 6.1 Calculation of backbone angle φ using covalent geometry along with the distance between carbonyls of two successive residues The dihedral angle φ i between atoms C i−1 , N i , Cα i and C i , i being the residue number, is defined using the trihedron cosine law as: where the angles α, β and γ are respectively the angles between atoms The angle α is defined by the force field as the atoms C i−1 , N i and Cα i are connected by two covalent bonds: As, due to the protein stereo-chemistry, α is in the 0-180 • range, sin α = √ 1 − cos 2 α.
Using the cosine law between atoms Cα i , N i and C i , cos β can be expressed as: Due to the protein stereo-chemistry, β is in the 0-180 • range and sin β = 1 − cos 2 β. Using the cosine law between atoms C i−1 , N i and C i , cos γ can be expressed as; The distance d(N i , C i ), which is the only parameter not defined by the force field in Eqs.
3 and 4 can be calculated using another cosine law expression between atoms N i , Cα i and C i : where the angle δ is obtained from the force field: δ = ab(N i , Cα i , C i ).
6.2 Calculation of the backbone angle ψ using covalent geometry along with the distance between the nitrogens of two successive residues The dihedral angle ψ i between atoms N i , Cα i , C i and N i+1 , i being the residue number, is defined using the trihedron cosine law as: where the angles α, β and γ are defined as the angles between atoms The angle α is defined by the force field as the atoms N i , Cα i and C i are connected by two covalent bonds: and because of the protein stereo-chemistry, α is in the 0-180 • range and sin α = √ 1 − cos 2 α.
Using the cosine law between atoms C i , Cα i and N i+1 , cos β can be expressed as: Due to the protein stereo-chemistry, β is in the 0-180 • range and sin β = 1 − cos 2 β. Using the cosine law between the atoms N i , Cα i and N i+1 , cos γ can be expressed as; The distance d(Cα i , N i+1 ), which is the only parameter not defined by the force field in Eqs. 8 and 9 can be calculated using another cosine law expression between atoms Cα i , C i and N i+1 : where the angle δ is obtained from the force field: δ = ab(Cα i , C i , N i+1 ). Table S1. see caption on the next page.   Table S4. Q factor, RMS (Hz) and R correlation factor calculated between experimental and calculated RDCs for each conformation shown in Figure 3 of the main text. The conformations belonging to the set of closed conformations are written in bold.  Table S5. Atom re-ordering used during the iBP calculation step within the first, the last and the inner residues of the peptide fragment. The order is described by the list of atoms names, the signs "-" and "+" describing atoms located in the previous and the next residues in the primary sequence.    Table S7. Population results obtained using only the SAXS data measured for scattering vectors q up to 3.5 nm −1 . Populations of conformations found using BioEn 0.1.1 13 on various sets of conformations including the 73 P1LnkP2 conformations for which the distance between the geometric centers of P1 and P2 was larger than 32Å and various closed conformations among 13, 20, 55, 150, previously selected according to the fit of PRE data. After ten runs starting from random values of populations and performed on the whole set of conformations, all conformations for which the sum of populations over the ten runs was larger than 0.01 were gathered, and ten additional BioEn calculations were performed on this reduced set of conformations. The average and standard deviation values of populations obtained for each selected conformation from the second set of BioEn runs, are given in the Table, along with the final average values of χ 2 and of entropy S. In each calculation, the conformations tagged as "not incl" have not been initially included in the calculation, whereas the conformations tagged as "-" were included in the calculation, but not selected by BioEn. 13 The conformations belonging to the set of closed conformations are written in bold.   Figure S2. Superimposition of experimental (black curves) and theoretical (red curves) SAXS data for representative calculations of populations for each of the runs indicated in Table 2 and S7. The runs described in Table S7 and based on the SAXS data measured for scattering vectors q up to only 3.5 nm −1 are labeled with stars. The plots were prepared using R 3.4.1. 24 Figure S3. On the two next pages. Superimposition of experimental (M obs , black contours) and theoretical (M theo , red contours) likelihood maps obtained by TALOS-N for each residue of Lnk. For M obs , the TALOS-N inputs were the experimental chemical shifts whereas for M theo , the inputs were the chemical shifts calculated on the conformations 12, 13, 24, 140, 176 and averaged according to the populations displayed in Figure 3 in the main text. The plots were prepared using R 3.4.1.