The hot sites of α-synuclein in amyloid fibril formation

The role of alpha-synuclein (αS) amyloid fibrillation has been recognized in various neurological diseases including Parkinson’s Disease (PD). In early stages, fibrillation occurs by the structural transition from helix to extended states in monomeric αS followed by the formation of beta-sheets. This alpha-helix to beta-sheet transition (αβT) speeds up the formation of amyloid fibrils through the formation of unstable and temporary configurations of the αS. In this study, the most important regions that act as initiating nuclei and make unstable the initial configuration were identified based on sequence and structural information. In this regard, a Targeted Molecular Dynamics (TMD) simulation was employed using explicit solvent models under physiological conditions. Identified regions are those that are in the early steps of structural opening. The trajectory was clustered the structures characterized the intermediate states. The findings of this study would help us to better understanding of the mechanism of amyloid fibril formation.


Results and discussion
The hot sites of αS. To find the first opening regions of conformation transitions of αS, maps of structural features against the residual numbers in the TMD simulations. Figure 1a1 shows local deformations compared to the initial configuration of the protein in the simulations. Red color indicates high changes in conformations while stable parts are in dark blue. The change in local gyration radius ( R g ) and the number of Hydrogen Bonds ( HB ) are also shown in Figs. 2a1 and 3a1, respectively. Figures 1, 2, and 3a1, all show similar patterns, indicating some regions that respond to deformations at initial stages. The end part of the protein (with residue numbers of after 90) having a very flexible coiled structure showed an irregular pattern, in contrast with structured parts of the protein. Longer bonds indicate the sites opened at early steps and formed the seeds for destroying the helical structure.
During the simulations, mentioned sites were opened in the following order: (i) the regions of (35)(36)(37)(38)(39)(40)(41)(42)(43) and (47)(48)(49)(50)(51)(52)(53)(54)(55), (ii) the regions of (65-75), and (iii) the region of (83-90) as highlighted by red, yellow, and cyan colors, respectively. The first opening regions (the hot regions) in tertiary, secondary and primary structures of αS are shown in Fig. 4a-c, respectively. Figure 4d shows the snapshots related to one of the TMD simulations in every 25 ns to better understand α-β conformational transition. There is a visible time priority of the opening regions (hot regions) in the protein dynamic along with the TMD simulation. The first opening sites are the regions in which significant structural changes occurred during αβT. The opening of these sites caused an increase in the local gyration radius and disappearance of the hydrogen bonds facilitating the transition of unstable conformations to the extended state. To investigate the thermodynamic stability of the results, all simulations were repeated for 300 K as reported in Fig. S1a1-a3 of the Supplementary Section. The RMSD, ( R g ), and HB were compared with those in 310 K and the results have been presented in Fig. S1b1-b3 respectively. There is a small change of RMSD in the hot regions while no significant changes were observed in ( R g ), and HB plots. However, the position and priority of the hot regions are conserved in both temperatures of 300 K and 310 K, indicating the results are thermally stable and are expected to be visible in in-vitro experiments.
To ensure the irreversibility of the protein conformational transition, TMD simulations were repeated in reversed direction as reported in Supplementary Section. The RMSD, ( R g ), and HB are obtained which show irregular patterns in the reversed direction of αβT, and explain the different pathways in the forward and backward directions (Fig. S2).
Properties of hot sites. According to Anfinsen's experiments, the amino acid sequence specifies the tertiary structure of proteins 28 . Propensities or conformational potentials were obtained from statistical analysis of secondary structure proteins, as the ratio of fractional occurrence of the residue in the given type of secondary structure to the fractional occurrence in all structures. According to the Chou-Fasman method 29 , the propensity of each residue in three types of secondary structures (Helix, extended, and coil) was calculated by Eq. (1).
where R and S are amino acids and secondary structures; and N S and N are total number of amino acids in conformation S and the total number of amino acids in all secondary structures, respectively. Also, F(R.S) indicates the number of occurrence of R in S , and P(R.S) is obtained as the propensity of R amino acid to be in S structure.
Based on this principle, Chou and Fasman described three classes of the residue propensities in three types of protein secondary structures for the first time. The dataset used by Chou-Fasman for computing propensities of the amino acids was only limited to 15 proteins and 2,473 amino acids 29 . Over the years, the volume of datasets used to calculate the Chou-Fasmans̓ parameters has increased and finally the last applied dataset included a number of 2,164 proteins 30 . Since today, a number of nearly 150,000 proteins have been identified in the Protein Data Bank (PDB) database; and we updated the propensity of each amino acid in three types of www.nature.com/scientificreports/ secondary structures in the protein dataset consisting of more than 3,500 unique protein chains. Table 1 shows the propensity of each amino acid in the three types of secondary structures.
Chameleonicity of hot sites. The chameleon site is defined as a distinct sequence that tends to be present in different secondary structure types of protein, meaning that these sites can adapted to different structures in response to their environment 31 . There are two major conditions for chameleon sequences; sequence propensity value to the beta structure should be more than 1 ( P β > 1 ; and secondary structure of protein should be in helix or coil conformations 32 . Therefore, αS helical regions that tend to have more than one beta value are good candidate of these regions. To identify chameleonic sites, propensities to extended conformations were averaged over the sliding windows (Fig. 5a). The regions of (14-18), (35)(36)(37)(38)(39)(40)(41)(42), (46)(47)(48)(49)(50)(51)(52)(53)(54)(55)(56)(57), (61)(62)(63)(64)(65)(66)(67)(68)(69)(70)(71)(72)(73)(74)(75)(76)(77)(78)(79)(80), and (90-94) were identified as chameleon sites with a high tendency for beta structure. These regions are the most likely to form the β-strands in αβT. A comparison of the hot regions (Fig. 5b) indicates the chameleonicity of these regions which help the protein to lose the helical configuration at the onest of the protein αβT. These sites are rich in valine and glycine residues, which together form a specific pattern called the G-V pattern (Fig. 5b). The G-V pattern gives high flexibility to hot regions that play key roles in conformational transition, which as described in "Role of G-V pattern in hot sites". www.nature.com/scientificreports/ Hydrophobicity of hot sites. Alternating polar and nonpolar residues create alternating hydrophobic and hydrophilic faces in the protein and facilitate beta-strand formation 33 . There are different alternating polar and nonpolar ( N/P ) patterns in aggregation-prone proteins 34 , which diversities in N/P pattern causing specific beta-sheet forms under different conditions 35,36 . The αS sequence has alternating polar and nonpolar sites stimulate the protein to form aggregates and nonpolar sites usually have high hydrophobicity that results in the tendency to formation of amyloid fibrils 37 . According to the Roseman's hydropathy scales 38 , Fig. 5c shows hydrophobicity values averaged over the sliding windows. The regions of (6-7), (16)(17), (37)(38)(39)(40), (49)(50)(51)(52)(53), (66)(67)(68)(69)(70)(71)(72)(73)(74)(75)(76), and (87-93) with values over 0 were identified as hydrophobic sites of αS. These hydrophobic regions are at the heart of hot sites and make them more potent for the formation of amyloid fibrils. In fact, these middle hydrophobic cores can act as a driving force for β-strand formation 39 leading to the initiation of αβT from these sites and subsequent sites. Figure S3 shows the relationship between hydrophobicity and propensity values. The values of 0.61 and − 0.37 were obtained from the correlation between hydrophobicity and P β and P α , respectively. This means that, hydrophobic sites tend to lose helical structures and convert them to extended structures. Although, both reported correlation values are not very high but they are significant at p-value of < 0.001.

Role of G-V pattern in hot sites.
The valine is an aliphatic and hydrophobic amino acid with the highest propensity to the beta structure in comparison with the other amino acids 30 . As the valine is small and has a Figure 2. The difference between gyration radius ( R g ) of each frame configurations with the first frame during TMD trajectories. The figures in left panels show kymographs of three sets of the TMD simulations, protein wild type in TIP3P water model (TIP3P-WT), mutated protein in TIP3P water model (TIP3P-MT) and protein wild type in TIP4PD water model (TIP4PD-WT). Color bar in the right side indicates the values and the locations of the hot sites are colored as red, yellow and cyan in the left side of the plots indicating the first, second and third priorities, respectively. B1 compares the average R g for valine and alanine residues in the hot sites (see text) the black arrows point to the most different of R g between the valine and alanine residues. B2 compares time averaged R g of the residues for the three sets of the simulations.
As it is more difficult to adopt valine-rich regions (hot regions) with the regular α-helical conformation these regions prefer to be in beta-sheet states. Absence of ring in the valine side-chain creates the main-chain amid hydrogens (NH), which have not been protected against solvent hydrogens 40 resulting in the smallest environmental changes around these regions that make them to convert the helical state to an extended one. Therefore, the presence of valine in hot regions provides an intrinsic tendency to lose helical structure and make these regions highly susceptible to long-term conformational transition. It is noteworthy that, the glycine is placed next to the valine residues in the valine-rich sites. Figure 5b indicates location of valine residues (black bars) and V-G pairs (green bars) over the protein sequence. There is a concentration of black and green lines in the hot regions which indicates the role of G-V pattern in the formation of these sensitive sites. The G-V sites are ending part of repetitive sequence of KTKEGV known to be able to form helical structure upon binding of the protein to the mitochondrial membrane 42 . It appears that the presence of the G-V sites acts as a key part of initiating conformational transition from helical to extended structures. The presence of glycine next to the valine results from intrinsic behavior of glycine in the ϕ/ψ space. Intrinsic behavior of amino acids plays a major role in their conformational preferences in the ϕ/ψ space 43,44 creating a set of dihedral angles to special values that form secondary structure types 45 and is identified in special regions of the Ramachandran plot 46 . Intrinsic behavior of glycine allows its ϕ and ψ angle values to fall in wide range 47 and its presence next to the valine causes it to act as hinge donating that stimulates the G-V regions during the αβT. www.nature.com/scientificreports/ All the G-V regions of hot sites were identified and their behavior was investigated in the ϕ/ψ space. The pictures in Fig. 6 represent dihedral angle values of G-V regions in hot sites during the TMD simulations. Distributions of valine dihedral angle values were in helix and beta areas of the plot, while the glycine ones started from the helix region and continued in the other parts of Ramachandran plot during the TMD simulation. This means that the presence of the glycine alongside the valine residue gives a great deal of flexibility in hot sites and made conformational transition from helical to extended states more convenient than the other parts of protein.
As shown in Fig. 7a1 and a2, the ψ and ϕ angle values indicate a two-state structure for valine while the glycine ones are more fluctuating. To ensure the valine effect on the αβT, the valine residues of the G-V regions were mutated to the Alanine residues and TMD was performed on mutated protein.
The alanine scanning of G-V sites. The alanine scanning 48 is a useful technique used to determine the contribution of valine residue to the guidance of helical structures towards extended structures. Since the alanine is a www.nature.com/scientificreports/ non-bulky and chemical inert amino acid 30 with highest propensity to the helix structure (See Table 1), it was selected as the best candidate instead of valine. To understand role of the valine in hot regions, all the valine residues in the G-V sites were mutated to the alanine and TMD simulation was performed on the αS mutated structure. Plots for the RMSD , ( R g ), and HB were obtained and the results have been presented in Figs. 1, 2, and 3a2. As can be seen, valine mutation to the alanine residue decreased flexibility of hot sites and reduced color pattern of hot bands showing the αβT is less favorable compared to the wild type. The tendency of the protein to keep the helical structure reduced fluctuation of dihedral angles and αβT occurred much later for mutant protein (shadow regions in Fig. 7b1, b2). Therefore, the intrinsic propensity of each amino acid in a variety of secondary structures has a significant effect on conformational transition of the protein.
More precisely, the mean RMSD , ( R g ), and HB of valine and alanine residues of the hot regions compared between the wild and mutant variants (Figs. 1, 2, 3b1). The RMSD values of mutant protein are smaller than in the wild type, indicating the mutate protein retains the helical structure more than the wild type. The difference of ( R g ) values between the wild and mutant types after 350 ns shows the tendency of the alanine residues to be in helical structure. Also, smaller values of wild type HB s indicate that the helical state in protein clears the protein faster than the mutant form. The mean time of the variables for each residues has also been plotted to compare the behavior of the protein (Figs. 1, 2, 3b2). In general, the mutate protein exhibits different behaviors across the RMSD , ( R g ), and HB curves (Specially the hot regions) compare to the wild type.

Influence of TIP4PD water on protein αβT.
To investigate the effect of the water model on the protein conformational transition, TMD simulations are repeated with TIP4PD water model for the protein wild type. The TIP4PD water model reproduced the most accurate conformational ensemble for intrinsically disordered proteins which is recommended for simulating of disordered proteins 49 . The RMSD , ( R g ), and HB plots were obtained and the results have been presented in Figs. 1, 2, and 3a3 respectively. Although, the TIP4P-D water model did not affect the location of hot regions but a moderate change was observed in the color pattern in Figs. 1, 2, and 3a3 compared to the simulations in TIP3P water model. For a better comparison, the Figs. 1, 2, and 3b2, show the average values along the simulations. The smaller values of RMSD , ( R g ), and HB in the hot regions indicate that the TIP4PD water model helps to preserve more the helical structure of these regions along the protein conformational transition and the TIP4PD water model has delayed the opening of the second region more than the other hot sites. However, the position and priority of the hot regions are conserved and the conclusion of the paper do not change. The low sensitivity of the hot regions indicates that the amino acid propensity to the secondary structure types plays a more dominant role than the water model in the conformational transition which shows TIP4PD water model effects on αβT process speed.

Conformational clusters in αβT pathway.
The dPCA was applied to the TMD trajectories to characterize significant conformers during αβT. As shown in Fig. 8a, the eigenvalue contribution of dPC indicates that the first dPC is accounted for more than 70% of the overall variance and over 85% of motions are covered in Table 1. The propensity of amino acids to be prefer in secondary structure types. The amino acid names are shown in 1-letter characters. The P α , P β and P C are the propensity values of amino acids in helix, extended and coil structures, respectively. www.nature.com/scientificreports/ the first three components. Therefore, the dPC space is defined by the first three dPC s and conformations are clustered into three clusters. The clustering was performed using the peak-picking algorithm to identify isolated peaks distributed in principal components of the configurations along the TMD trajectories, corresponding to discrete clusters 50 . According to Fig. 8b, three clusters with the population of 18.34%, 47.20% and 13.40% are respectively shown in green, red, and cyan colors in dPC space. Green and cyan clusters contain the configurations that are respectively close to helical (initial) and extended (target) states. A centroid point of each group was selected as representative conformation and their RMSD compared to initial and target configurations are reported in Table 2. An accurate view of the RMSD values indicated the structural similarity between representatives. The representative of the second cluster could be between the helical and extended states. This cluster with 47.20% of total population showing the most configurations during protein conformational transition. These configurations are related to the early stages in αβT and first and second priorities of hot sites are active in them. While the configurations of the first and third clusters are more similar to the helical and extended states, respectively. It can also be seen that the hot regions with the highest priority are active in the first cluster configurations while all the hot sites are involved in the third cluster configurations (see Fig. 8c).

Conclusions
In this study, a comparison was found for monomeric αβT of αS in different conditions using TMD simulations.
The increasing in the gyration radius and the decreasing the number of hydrogen bond in hot regions resulted in the formation of unstable and temporary conformations. The results indicated that G-V patterns play a major

Methods
Selecting of starting and ending points in α-β transition. TMD simulations are needed to select two stable structures for starting and ending points. The helical state is structured as well as its full-length PDB file is available in the Protein Data Bank (determined using nuclear magnetic resonance (NMR) spectroscopy, pdb ID: 2KKW 54 ). The structure of αS fibrils at atomic resolution (pdb ID: 2N0A 55 ) was selected for the ending point of TMD simulations. This structure was determined using nuclear magnetic resonance (NMR) spectroscopy containing full-length protein chains in extended states. This selection represents a single chain that interacts with other amyloid fibrils chains. This helps TMD simulations to sample the helical transition to amyloid fibril structure. Since we are locally focused on only one chain of amyloid fibrils, polymorphic properties do not affect the results (See Supplementary Fig. S4).
TMD procedure. The αS structures in both helical and extended conformations were obtained from the Protein Data Bank (PDB) database. The first model of NMR structure with PDB ID of 2KKW and the first chain of NMR structure with PDB ID of 2N0A were selected as helical (folded) and extended (amyloid) structures, respectively. In this study, 3 TMD 26 simulations were performed to focus on the monomeric conformational transition of αS. The protein alpha-beta transition and reversed transition were investigated to find critical regions in the structural transitions and for understanding the conditions to create β-forming regions. In the αβT simulations, helical structure (based on PDB ID of 2KKW) was considered as initial conformation forced toward extended configuration, and in reversed transition, extended structure (based on PDB ID of 2N0A) was considered as initial structure.
Moreover, another set of simulations was performed on the αS mutated structure to prove that valine plays a key role in protein conformational transition. All the valine residues at the hot sites (see section "The alanine scanning of G-V sites") of αS helical structure (based on PDB ID of 2KKW) mutated to the alanine were modeled by MODELLER 9.20 56 and energy was minimized in constructed 3D model.
TMD was performed in cubic box of 9,086 water molecules which was neutralized by the addition of 9 Na + ions. TIP3P 57 was used to model water molecules and CHARMM27 force-field parameters were applied. In all simulations, 50,000 minimization steps of the conjugated gradient were done for frozen protein Cα atoms with a positional harmonic force of 10 kcal mol −1 Å 2 and heating up to 310 K over 300 ps for NPT ensemble. The Langevin thermostat 58 and the Nose-Hoover barostat 59 were applied to keep temperature and pressure at 310 K and 1.01325 bar, respectively. A short-range cutoff of 10 Å was treated for non-bonded interactions, and longrange electrostatic interactions were considered using the Particle Mesh Ewald (PME) method 60 combined with periodic boundary conditions. At initial stages of equilibration, a time step of 1 fs was done while, the SETTLE www.nature.com/scientificreports/ algorithm 61 was used to keep hydrogen-heavy atom bond lengths frozen during simulation at subsequent steps thus, a time period 2 fs was adopted. After equilibration, all the restraints were gradually removed and TMD was carried out for 500 ns. During the TMD simulations, our NPT ensemble included an additional energy term based on the RMSD of protein residues that force the molecule concering a prescribed target structure. The time-dependent energy function was as follows.  www.nature.com/scientificreports/ where, N represents the number of C α atoms in protein backbone, K is harmonic force constant set as 200 kcal mol −1 Å −2 , and R is the Root Mean Square Deviation (RMSD) between a conformation at time t and target conformation. ρ(t) is reference RMSD value at time t that linearly decreased from 24.22 to 0 Å within the TMD simulation time. The TMD forces were applied to the alpha carbons during the simulation. The center of mass and protein orientation was fixed during the simulation to prevent molecular rotation. The NAMD 2.13 program 62 was utilized for the TMD simulations and all related analyses were done using the VMD 1.9.3 27 .
To investigate the effect of water model on protein conformational transition, similar TMD simulations are performed using the TIP4PD water model 49 . Since the updated version of CHARMM force field by Mackrell published in 2019, July (https ://macke rell.umary land.edu/charm m_ff.shtml ) didn't have the parameters and topology files for TIP4PD water model, we applied the TIP4P-2005 files and modified the charge, energy ( ε ) and the minimum distance (R min ) of the water atoms according to the parameters reported in the David Shaw's paper 49 . A small TIP4PD water box (100 Å per dimension) was fabricated using the PACKMOL package 63 and relaxed with a 2 ns regular MD simulation. This obtained box is applied to make the solvent box for the system. The previous standard TMD simulations protocol was performed. TMD simulations performance decreased by %10 due to the use of the TIP4PD water model.
Applying the sliding window to fragmental analysis. To consider the effect of neighbor residues, all the properties were statistically averaged over a sliding window of residues along the protein chain. The average value of every property was assigned to the middle residues of each sliding window in the αS sequence. The size of the sliding window can be between 3 and 10 residues 64 but since, in biological concentration, the probability of amyloid formation is low for small protein fragments, larger sizes are more suitable for window selection 65 . In this paper, number of 7 was selected as the size of the sliding window (See Supplementary Fig. S5). So, every feature was computed using the sliding window, and has been issued for the 4th to 137th residues of αS. The sliding window was used for fragmental averaging of the protein propensity, hydrophobicity, gyration radius, and the number of hydrogen bonds.

Analysis of TMD trajectories.
To understand conformational changes during the protein αβT, several types of analyses were performed on the TMD trajectories. The radius of gyration ( R g ) and RMSD values of the alpha carbons, Hydrogen bonds (the bonds with a bond length cutoff of 3.0 Å and an angle cutoff of 20°), and dihedral angles of key residues of critical sites for each frame of TMD trajectory were calculated over sliding windows using VMD 27 .
Principal Component Analysis of backbone dihedral angles (dPCA ) of the TMD trajectory was performed in CARMA version 1.7 66 . The clustering method was performed based on a peak-picking algorithm 67 embedded in the CARMA applied to three-dimensional distributions of principal components derived from the TMD trajectory. The first three principle components (three largest dPC s) were considered to identify prominent molecular configurations for populated clusters in three-dimensional dPC space.