Emergence of Alternative Structures in Amyloid Beta 1-42 Monomeric Landscape by N-terminal Hexapeptide Amyloid Inhibitors

Alzheimer’s disease (AD) is characterized by deposition of amyloid beta (Aβ) peptides into senile plaques in the brain. While most familial mutations are associated with early-onset AD, recent studies report the AD-protective nature of two genetic human Aβ variants, i.e. A2T and A2V, in the heterozygous state. The mixture of A2V Aβ1-6 (Aβ6) hexapeptide and WT Aβ1–42 (Αβ42) is also found neuroprotective. Motivated by these findings, in this study we investigate the effects of WT, A2V, and A2T Aβ6 hexapeptide binding on the monomeric WT Aβ42 landscape. For this purpose, we have performed extensive atomistic Replica Exchange Molecular Dynamics simulations, elucidating preferential binding of Aβ42 with the A2V and A2T hexapeptides compared to WT Aβ6. A notable reorganization of the Aβ42 landscape is revealed due to hexapeptide association, as manifested by lowering of transient interactions between the central and C-terminal hydrophobic patches. Concurrently, Aβ6-bound Aβ42 monomer exhibits alternative structural features that are strongly dependent on the hexapeptide sequence. For example, a central helix is more frequently populated within the A2T-bound monomer, while A2V-bound Aβ42 is often enhanced in overall disorder. Taken together, the present simulations offer novel molecular insights onto the effect of the N-terminal hexapeptide binding on the Aβ42 monomer structure, which might help in explaining their reported amyloid inhibition properties.


Model and Methods
Molecular dynamics simulation at various resolutions 1-6 serves as a useful tool for the structural characterization of intrinsically disordered peptides, such as the Amyloid beta (A). In this study we have performed extensive Replica Exchange Molecular Dynamics 7,8 (REMD) simulations to generate conformational ensemble of a single A42 monomer in presence of a WT, A2V, or A2T Avariant. REMD is an enhanced sampling technique that has been successfully used to generate robust structural ensembles for intrinsically disordered proteins, such as the Aβ peptide, that lack a single native conformation and instead exists as dynamic ensemble of rapidly interconverting conformations [9][10][11][12] . In REMD, several identical replicas (i.e. copies) of the system are simulated in parallel over a wide range of temperature. Conformations populated at different temperatures are exchanged at frequent intervals following a Metropolis Monte Carlo criterion, so that the principle of detailed balance is satisfied. Employing this method ensures that the system does not get trapped in a local minimum on the free energy landscape 7 .
The following protocol was used to set up the initial structure for the REMD simulation, which is similar to what was used in our earlier published simulations of the free A42 monomer variants 13 . A randomly collapsed A structure was generated from a short (~10 ps) MD simulation performed at 700 K in vacuum. The collapsed structure was then solvated, chargeneutralized by adding ions, energy minimized, and equilibrated for 1 ns in an NPT ensemble (300 K and 1 atm). The structure of the short WT A peptide was constructed using the first six residues of the full-length monomer as a template and ensuring that the phi and psi dihedral angles are consistent with a random coil structure. Next, the A monomer and the WT A hexapeptide were placed together in a 5.7 x 5.7 x 5.7 nm 3 sized cubic box that contains ~5,600 water molecules ( Fig. 1b, main text), resulting in an effective peptide concentration of 10 mM. The minimum distance (heavy atom only) between Aand A in the initial structure was at least 1.5 nm. The peptide termini were charged, and four Na + ions were added to charge-neutralize the system. A similar protocol was employed to generate the initial solvated structure of the A in presence of the A2V or A2T hexapeptide variant, in which the sidechain of residue 2 of the hexapeptide was accordingly mutated in silico using the Mutate plugin of the VMD software 14 . Prior to the REMD run, all solvated systems were first subjected to a 10,000-step energy-minimization, followed by a 10 ns long equilibration run in an NPT ensemble (300 K and 1 atm The system was coupled to a Nosé-Hoover heat bath to maintain constant temperature between the swaps. Periodic boundary conditions were applied and the particle-mesh Ewald (PME) method was used to treat long-range electrostatics 16 . A 1.0 nm cut-off distance was used for van der Waals interactions. The bonds were constrained using LINCS 17 and SETTLE 18 algorithms, and an integration time-step of 2 fs was used. A combination of OPLS-AA force field 19 and TIP3P water model 20 was used, which has been previously reported to generate disordered protein structural ensembles, such as A, in agreement with NMR experiments 21-23 . This combination is found to be well suited for simulating A fragments 24 and full-length A peptides 25 , and has been also used in our previously published A monomer and dimer simulations 13,26 . The simulations were performed using the GROMACS 4.5.4 software 27 in IBM BlueGene/Q supercomputers.

Conformational analysis:
The secondary structure of the peptide was calculated using the STRIDE program 28 . The C-C tertiary contact between a pair of residues was defined by using a distance cut-off of 0.8 nm between the C atoms. Only non-sequential contacts (|i-j| ≥3) were considered for tertiary interactions. A salt bridge was defined using a distance cut-off of 0.32 nm between the side-chain oxygen atom of an acidic residue and the side-chain nitrogen atom of a basic residue. An inter-peptide contact was defined by using a distance cut-off of 0.5 nm (heavy atoms only). The hexapeptide was considered to be bound to the A monomer in a structure, only if ≥ 25 intermolecular contacts were present. The A conformational ensemble was clustered using the Root Mean Squared Deviation (RMSD) as a similarity measure. The Daura algorithm 29 with a 0.45 nm C-RMSD cut-off between two conformations was used for clustering of the whole production ensemble (Fig. S2). All peptide structures were rendered using VMD software 14 .

Solvent Accessible Surface Area:
The hydrophobic solvent accessible surface area (SASA) was calculated using the VMD plugin with a probe radius of 1.4 Å 14 The average SASA value for the CHC-CTR -hairpin structure was calculated using the free A (structures corresponding to regions 6 and 7 of Fig. S6). The average SASA value for CTR-hairpin structures was obtained from the largest cluster of WT-bound and A2T-bound S4 sub-populations. For the NTR-CTR -sheet structures, the largest cluster of WT-bound S5 and A2V-bound S4 sub-populations were used.
Binding energy calculation: The average binding free energies (ΔGbind) between A and A were calculated using the MM-PBSA method 30,31 . In this technique, the total binding energy is a cumulative of the molecular-mechanics energy (van der Waals and electrostatics) and solvation (non-polar and polar solvation) energy terms. The MM-PBSA method does not account for the entropic contribution to binding. The non-polar solvation energy was estimated using a model based on a combination of solvent-accessible surface area and solvent accessible volume calculation 30,32 . The molecular dielectric surface was defined by a probe radius of 1.4 Å. The dielectric constants of the protein interior and the aqueous environment were set to 2 and 80, respectively, and the ionic concentration set to 150 mM. Standard deviations (SD) were estimated by using a bootstrap approach, and standard errors of mean (SEM) were computed from the SD derived from an ensemble of ~30,000 structures. PMF analysis and clustering: Potential of mean force (PMF, W(X)) was defined by the equation where X is a set of reaction coordinates and p(X) is the probability.
Following reaction coordinates were defined for this purpose: (i) the total number of residues from the CHC and CTR in an extended -strand conformation (NCHC+CTR); and (ii) the number of C contacts between CHC and CTR (NCCHC-CTR); both normalized to unity. Six sub-populations (S1 through S6) were defined on the PMF landscape that correspond to the regions having PMF energy < -4 kcal/mol (see Fig. 5, main text). The arbitrary boundaries used to define these sub-populations are reported in Table S3. The sub-populations that individually represent ≥10% of total production ensemble were further analyzed. An RMSD-based clustering was performed using the Daura algorithm 29 and a 0.3 nm pairwise C-RMSD cut-off on each of those sub-populations.
Error analysis: Standard errors of mean (SEM) were estimated for secondary, tertiary, and quaternary structural features, as well as for the relative population of S1-S6 states on the PMF landscape. These SEM values were obtained from standard deviations estimated by dividing the simulation data into two (or four) 35 ns long, non-overlapping blocks between 60 -130 (or 60-200) ns. The SEM in the probability difference plots (secondary and tertiary structure analyses, Figs. 2 and 3, main text) were estimated by following the standard rule of error propagation.

Equilibration and convergence assessment of REMD ensemble
Along with the Root Mean Square Deviation (RMSD) analysis described in the main text ( Fig. 1c), we performed the following evaluations to further validate the REMD ensemble. To ensure that the choice of initial peptide structure did not bias the final results, we checked the evolution of intermolecular distance between the centers of mass (COM) of the full-length peptide and the hexapeptide. The evolution of intermolecular distance for the 308.4 K replica is shown in Fig. S1. The interpeptide distance equilibrates to about 1.2 nm by 50 ns, and fluctuates steadily around this mean with a standard deviation of ± 0.4 nm. This observation suggests that the ensemble populated after 50 ns is independent of the initial conformation. Therefore, we considered the 60-200 ns portion of REMD trajectories for further analysis.
Next, we compared the probability distributions of (i) the radius of gyration (Rg); (ii) the number of intra-peptide CHC-CTR C contacts (NCHC-CTR); and (iii) the residue-wise turn propensity of A over two time intervals, 60 -130 ns and 60 -200 ns, in order to ensure that the simulations have reasonably converged. Results for the 308.4 K replica are shown in Fig. S1 and Table S1. For all three distributions, the overall range and the major features remain constant between the two time-intervals. The mean Rg value is 1.04 nm, 1.11 nm, and 1.05 nm for the WTbound, A2V-bound, and A2T-bound A and remain unchanged over the two time-windows of 60-130 ns and 60-200 ns. The average number of CHC-CTR contacts is about 12 for all three hexapeptide-bound systems. The turn propensity distributions are virtually indistinguishable between the different time windows, suggesting good convergence of the REMD trajectories.

Size and disorder of the production ensemble
The Rg distribution of the production ensemble is distributed around the mean value of 1.06 ± 0.12 nm for free, 1.05 ± 0.07 nm for WT-bound, 1.07 ± 0.10 nm for A2V-bound, and 1.05 ± 0.08 nm for A2T-bound A (Fig. S2). This value matches well with earlier experimental 33 and simulations studies 1,21,22 and is consistent with a collapsed globule. Interaction with WT or A2T hexapeptides increases the population around the mean Rg value, while the distribution corresponding to the A2V-bound peptide is similar to that of the free A monomer. Since A is an IDP, we performed a RMSD-based clustering for the A ensembles to compare the structural disorder (see Methods and Fig. S2). It is interesting to observe that the population distribution of the largest 50 clusters is similar for all four A systems, indicating that the overall structural heterogeneity is not strongly affected by the NTR hexapeptide binding.

Register-shifted -hairpin within A2V S5 structures
Analysis of A2V-bound S5 structures that present 11.6% of the total ensemble reveals a strand rich population (Fig. S9). The -hairpin is comprised of -strands involving residues 18-22 and 27-31 and a short turn region around residues 23-26. This structure is different from the transient CHC-CTR hairpin often observed within free A S6 structures (Fig. S6), as the hairpin within A2V-bound S5 A shows a register shift towards C-terminus and a shorter turn. The hexapeptide mainly interacts through residues D1 and V2 with NTR, CHC or CTR residues of the A42 monomer.

Representative structure of S1-S6 sub-population
The most representative structure of each significant (>10%) sub-population of free and WT/A2V/A2T bound A is shown in Figure S10. While the free and the A2V-bound peptides in S1 are overall unstructured, the A2T-bound S1 structure has the preCHC/CHC helix. The WTbound S2 structure also acquires the preCHC/CHC helix conformation, while the A2V-bound S2 remains unstructured. Only the A2T-bound peptide shows a significant population of the S3 region and frequently visits the preCHC/CHC helix conformation. The free A structure from the S4 region forms a -hairpin with residues 23-40 with a short helical turn in the CHC region. While the WT-bound and A2T-bound S4 structures are comprised of the CTR-hairpin, A2V-bound S4 structure forms the NTR-CTR -sheet. Within S5 sub-population, WT-bound structure also forms the NTR-CTR -sheet, while A2V-bound structure acquires the register-shifted -hairpin conformation. The S6 A structure (bound or free) consistently exhibit the disease-implicated CHC-CTR hairpin conformation. Taken together, our analysis confirms the sequence-dependent effect of the hexapeptide interaction on the A structure.        Figure S10: Representative structures from each sub-population S1 -S6, that is populated above 10%. The color code is the same as that used in Fig. S7. Table S1: REMD convergence assessment, as estimated from results (mean ± standard deviation) obtained over 60-130 ns and 60-200 ns time-window. Table S2: The total interpeptide binding free energy, as well as contribution of individual energy component, as estimated using the MM-PBSA method (in kcal/mol; ± standard error of mean). ∆EVdW = van der Waals; ∆Eelec = electrostatic; ∆Esolv = polar solvent; ∆ESASA = solvent-accessible surface area; ∆ESAV = solvent-accessible volume terms. ∆Gbind = total binding free energy. Table S3: Boundaries of S1-S6 sub-populations on the PMF landscape.